Posted by Gigi Sayfan
on May 18, 2016
Software is infamously hard. This notion dates back to the 70s with the software crisis and the "No Silver Bullet" paper by Fred Brooks. The bigger the system, the more complicated it is to build software for it. The traditional trajectory is build a system to spec and watch it decay over time and rot until it is impossible to add new features or fix bugs due to the overwhelming complexity.
But, it doesn't have to be this way. Robust software (per my own definition) is a software system that gets better and better over time. Its architecture becomes simpler and more generic as it incorporates more real world use cases. Its test suite gets more comprehensive as it checks for more combinations of inputs and environment. Its real-world performance improves as insights into usage patterns allow for pragmatic optimizations. Its technology stack and third-party dependencies are getting upgraded to take advantage of their improvements. The team gets more familiar with the general architecture of the system and the business domain (working on such a system is a pleasure so churn will be low). The operations team gathers experience, automates more and more processes and builds or incorporates existing tools to manage the system.
The team develops APIs to expose the functionality in a loosely coupled way and integrates with external systems. This may sound like a pipe dream to some of you, but it is possible. It takes a lot of commitment and confidence, but the cool thing is that if you're able to follow this route you'll produce software that is not only high quality but also fast to develop and adapt to different business needs. It does take a significant amount of experience to balance the trade-offs between infrastructure and applications needs. If you can pull it off, you will be handsomely rewarded. The first step in your journey is to realize the status quo is broken.
Posted by Sandeep Chanda
on May 16, 2016
The Azure Internet of Things team has recently open sourced the gateway SDKs that can be used to build and deploy applications for Azure IoT.
There are two classes of SDKs that have been made available. The device SDK that allows developers to connect client devices to the Azure IoT Hub and the service SDK that enables management of the IoT service instances in your hub. The device SDK supports a range of OSes running on low fidelity devices that typically support network communication, have the ability to establish a secure communication channel with the IoT Hub, are able to generate secure tokens for authentication and have a minimum of 64 KB RAM as the memory footprint.
The device SDK is available in C, .NET, Java, Node.js, and Python, while the service SDK is currently available in .NET, Node.js, and Java. In order to register clients using the device SDK, you will first create an IoT Hub instance in Azure using the management portal and then use the connection string of your IoT Hub to register a new device. If you reference the .NET SDK, you can use the
Microsoft.Azure.Devices.Client library that exposes various methods to interact with the gateway such as the
Create method to create a
SendEventAsync to send an event to the device hub. The
Microsoft.Azure.Devices.Client library supports both AMQP and HTTPS protocols. The messages can also be sent in batches using the
SendEventBatchAsync method that will send a collection of
Message to the device hub.
The services SDK is available as the
Microsoft.Azure.Devices library. You can use the
RegistryManager class to register a device:
To receive device-to-cloud messages, you can create a receiver using the
EventHubClient class and use the
ReceiveAsync method to start receiving event data asynchronously.
You can clone the IoT Gateway SDKs repository from GitHub and customize it for your own gateway solutions using Azure IoT.
The Azure IoT Gateway is promising because, while developers can connect their devices to IoT platforms, there are many scenarios that require edge intelligence, e.g. sensors that cannot connect to the cloud on their own. The IoT Gateway SDKs make it simple for developers to develop on-premise custom computation wherever a standard solution doesn't work.
Posted by Sandeep Chanda
on May 5, 2016
During the Build 2016 conference, Vittorio Bertocci, the Principal Program Manager at the Microsoft Identity division announced the availability of a new authentication library named MSAL (Microsoft Authentication Library). It is poised to become one unified library that provides a single programming model for different identity providers such as Microsoft Accounts, and Azure Active Directory.
MSAL finds its origins in ADAL which was tailored to work exclusively with Azure AD and ADFS. MSAL is better in terms that it supports apps, agnostic of the authority mechanism being MSA or any Azure AD tenant. It also provides better protocol compliance and overcomes some of the issues with ADAL such as working with cache in multi-tenant applications. Another feature that makes it a universal identity provider is that it supports standard definition scopes instead of resources that are proprietary to Active Directory. With MSAL you don’t need to know native protocols like OAuth and Open ID Connect. It provides the necessary wrappers for you to program with the library and perform identity related operations at a high level without having to know a lot of details about the native protocols. Notably multi-factor authentication is supported out of the box. Overall, however, the most fascinating feature of this library is the ability for the app to ask for permissions incrementally and support transparent refresh tokens.
The two primary operations exposed by MSAL are:
- PublicClientApplication — used for desktop clients and mobile apps
- ConfidentialClientApplication — for server side apps and other web based resources
You can start using MSAL using the new authority endpoint. Note that you need to register your app first and get the client id. The new endpoint supports both personal and work accounts. During the authentication process you will receive both the sign in info and also an authorization code that can be used to obtain an access token. In a single sign-on scenario, that token can be used to access other secured resources that are part of the same sign-in. The following code illustrates how the ConfidentialClientApplication primitive is used to fetch the token and access the resource securely:
ConfidentialClientApplication clientApp = new ConfidentialClientApplication(clientId, null,
new ClientCredential(appKey), new MSALSessionCache(userId, this.HttpContext));
You can then use the
AcquireTokenSilentAsync method to get the token by asking for the scopes you need.
Posted by Sandeep Chanda
on April 28, 2016
Alberto Brandolini has been evangelizing the idea of EventStorming for a while now. It is a powerful workshop format to breakdown complex business problems pertaining to real world scenarios. The idea took its shape from the Event Sourcing implementation style laid out by Domain Driven Design. The outcome of the workshop format produces a model perfectly aligned with the idea of DDD and lets you identify aggregate and context boundaries fairly quickly. The approach also leverages easy-to-use notations and doesn't require UML, that in itself, might become a deterrent for some participants of a workshop who are not so familiar with UML notations.
The core idea of EventStorming is to make the workshop more engaging and evoke thought provoking responses from the participants. A lot of times discovery is superficial and figuring out the details are deferred for later. On the contrary, EventStorming allows participants to ask some very deep questions about the business problem that were very likely playing in their sub-conscious minds. It creates an atmosphere where the right questions can arise.
A core theme of this approach is unlimited modeling space. Modeling complex business problems is often constrained by space limitations (mostly whiteboard), but the approach allows anything to be leveraged as a platform where the problem can be modeled. You may pick anything that can come in handy and help you get rid of the space limitations.
Another core theme of this approach is the focus on Domain Events. Domain Events represent meaningful actions in the domain with suitable predecessors and successors. Placing events in a timeline on a surface allows people to visualize upstream and downstream activities and model the flow easily in their minds. Domain events are further annotated with user actions that are represented as Commands. You can also color code the representation to distinguish between user actions and system commands.
The next aspect to investigate is Aggregates. Aggregates here should represent a section of the system that receives the commands and decides on their execution. Aggregates produce Domain Events.
While a Domain Event is key to this exploration technique, along the way you are also encouraged and motivated to explore Subdomains, Bounded Context and User Personas. Subsequently, you also look at Acceptance Tests to remove any amount of ambiguity arising out of edge-case scenarios.
Posted by Gigi Sayfan
on April 27, 2016
In recent years the database scene has been boiling. New databases for special purposes seem to emerge every day and for a good reason. Data processing needs have become more and more specialized and with scale you often need to store your data in a way that reflects its structure in order to support proper queries. The default of storing everything in a relational database is often not the right solution. Graphs, which are a superset of trees and hierarchies are a very natural way to represent many real world concepts.
Facebook and its social graph brought it to center stage, but graphs, trees and hierarchies were always important. It is possible to model graphs pretty easily with a relational database because a graph is really just vertices connected by edges, that map very well to the entity-relational model. But, performance is a different story at scale and with deep hierarchies.
Many graph databases exist today with different levels of maturity. One of the most interesting developments in this domain is the TinkerPop Apache incubator project, which is a graph computing framework with active industry contribution. Checkout the getting started tutorial here to get familiarized with the terrain.
Other interesting projects include DataStax (Cassandra) acquiring TitanDB and incorporating it and of course Facebook's GraphQL. And, there is always Neo4J that has a mature enterprise-ready solution and an impressive list of customers.
The exciting part is how to combine graph databases with other data systems and how to construct viable data intensive systems that scale well to address new data challenges such as the coming of the IoT era where sensors will generate unprecedented amounts of data.
Posted by Sandeep Chanda
on April 20, 2016
At the recently concluded Build Conference, the Microsoft Azure team announced the preview of Functions, an app service that allows you to execute code on demand. Azure Functions is event-driven and allows serverless execution of code triggered as a result of an event or a coordinated set of events. It allows you to extend the application platform capabilities of Azure without having to worry about compute or storage. The events don’t have to only be occurring in Azure, but in virtually any environment including on-premise systems. You can also make Functions connected to be part of a data processing or messaging pipeline, thereby executing code asynchronously as part of a workflow. In addition, Azure Functions Apps can scale on demand, allowing you to pay for only what you use.
Azure Functions offers support for a myriad of languages, such as PHP, C# and most of the scripting languages (Bash and PowerShell, for example). You can also upload and trigger pre-compiled code and support dependencies using NuGet and NPM. Azure Functions can also run in a secure setting by making them part of an app service environment configured for a private network. Out of the box, it supports OAuth providers such as Azure Active Directory, Facebook, Google, etc.
To create a Functions app login to your Azure portal and search for Functions in the Marketplace search bar.
Select the Function App service, provide a name, select a resource group and create the app.
Once you create a Function app, it also creates an AppInsights monitoring app that attaches to monitor the health of the Functions app. After the Function app is deployed, you can navigate to the Function App console from the dashboard. The console has two windows. The Code window (an editor with a code highlighter) is where you can put your script and then you can verify the outcome under the Logs window.
You can then make the Function integrate with an event driven workflow as illustrated in the figure below:
You can configure an input and an output for the trigger in the Integrate tab. In this example, a storage queue is set as the default trigger but you can set it to be a Schedule, a Webhook, or Push Notifications amongst others.
There are also a bunch of pre-defined templates you can use to start creating your function from scratch.
Additionally, there are several advanced settings available for you to configure, for example setting up authentication, authorization, integrating with a source control for continuous integration, enabling CORS and providing API definitions to allow clients to easily call them.
In a world heavily dominated by the likes of Amazon, Google, and IBM, it is a good move for Microsoft to release Azure Functions, providing more options for developers and enterprises to choose and extend their PaaS implementation.
Posted by Gigi Sayfan
on April 18, 2016
Many organizations have a hard time making decisions about adopting new technologies. There are two prominent camps. The early adopters will jump at any new technology and push towards integrating it or replacing last week's new tech. Then, you have the careful crowd that just doesn't want to rock the boat. As long as everything works they are not interested in switching to something better. Between those two extremes of risk taking and risk aversion are the rest of us, trying to get by and take advantage of new development, but not at the cost of bringing the whole system down. So, what is a rational process for adopting new technologies?
There are several principles that will serve you well. Newer technologies, even if demonstrably better than incumbent alternatives, take time to reach maturity. The more complicated and crucial — the longer it takes to reach maturity. For example, a new library for parsing text may be fine to switch to after playing with it a little bit and verifying it works well. A totally new distributed NoSQL database is a different story and will probably require several years until you should put your company's fate there. The crucial element then is timing.
If a new technology has been battle tested long enough, has an active community and demonstrates clear benefits on a more mature alternative you may consider switching to it even for important projects. A good example is the Flask vs. Django debate. At this point, Flask has crossed the critical threshold as far as I'm concerned. When you do decide to adopt a new technology, you should do it gradually (maybe start with a non-critical small project) and have a plan B (probably stick with what you have) in case you discover unforeseen issues.
Posted by Sandeep Chanda
on April 11, 2016
The Azure Batch service is now available for general use. It is a fully managed service hosted in Azure that lets you configure scheduled jobs and supports performing compute resource management for other cloud based services. It is a turnkey solution for running large scale High Performance Computing (HPC) applications in parallel, leveraging the cloud scale. Note that it is a platform service that allows you to run resource intensive operations on a managed collection of virtual machines and can scale automatically depending on the need.
There are several use-cases for Azure Batch including scientific computations such as Monte Carlo simulations, financial modelling, media transcoding, and a more common scenario with automated testing of applications. The Azure Batch service works very well with scenarios that are intrinsically parallel in nature. Scenarios where a workload can be broken into multiple tasks that can run in parallel are the best possible use cases for Azure Batch service. Not only can the managed service run multiple workloads, it can also be configured for parallel calculations with a reduce step in the end.
To configure a new Batch service, login to your Azure portal and then find the Batch managed service from the Azure Marketplace by typing Batch in the search window.
Specify a name for the Batch service and configure the resource group and the storage account:
Once the service is deployed you will see the dashboard to configure the applications and jobs as illustrated in the following figure:
Now that you have successfully created the Batch Account, you can use the batch service most commonly in two ways:
- Use the Batch .NET API to programmatically schedule the job
- Use it part of a larger workflow like Azure Data Factory
In addition, there is an Azure Batch Explorer sample application available in GitHub that you can run to browse and manage the resources in your Batch Account.
You can use the Batch API to perform tasks such as creating a Job, provisioning a Schedule and adding Tasks to a Job. For example, you can create a console application that reads from a file and performs multiple parallel operations based on the content of the file. You can store the application in Azure Blob Storage and then configure the Job to run the application on a regular interval.
Posted by Gigi Sayfan
on April 5, 2016
When building, developing and troubleshooting complex systems everybody agrees that modularity is the way to go. Different parts or components need to interact only through well-defined interfaces. This way the complexity of each component can be reduced to its inputs and outputs. That's the theory. In practice, this is extremely hard to achieve. Implementation details are sometimes hard to contain. This is where black box and white box testing can come in handy, depending on the situation.
Consider an interface that expects a JSON file as input. If you don't specify exactly in the schema, then the format of the file can change and break interactions that worked previously. But, even if you put in the work and discipline and properly separated all the concerns and rigorously defined all the interfaces, you're still not in the clear. There are two big problems:
- If your system is complicated enough then new development will often require changing interfaces and contracts. When that happens you still need to dive in and understand the internal implementation.
- When something goes wrong, you'll have to troubleshoot the components and follow the breadcrumbs. There is no escaping the complexity when debugging. Under some circumstances, very well factored systems that abstract as much as possible are more difficult to debug because you don't have access to a lot of context.
But, black box and white box testing is not just about the system. It's also a property of people working with the system. Some people thrive on the black box view and keep an abstract view of the components as black boxes and their interactions. Other people, must see a concrete implementation and understand how a component ticks before they can climb up the abstraction ladder and consider the whole system.
There are good arguments for both views and while working with most complicated systems you should be able to wear both hats at different times.
Posted by Sandeep Chanda
on March 24, 2016
Azure Data Factory offers capabilities to orchestrate data movement services that can scale using Azure Infrastructure. Not just that, you can visualize the data lineage connected to both on premise and cloud data sources and monitor the health of the pipeline as well. A few weeks back the Azure team published a Code-free Copy tool for Azure Data Factory that allows hassle free configuration and management of data pipelines without having to write any script using a declarative designer. A simple wizard allows you to explore data sources between various cloud offerings like SQL Azure, Azure Storage, Azure SQL Data Warehouse etc., as well as your local SQL Server database. You can also preview the data, apply expressions to validate and perform schema mapping for simple transformations. You get the scalability benefits of Azure and hence you can move hundreds and thousands of files and rows of data efficiently.
To start, first login to your Azure portal and search for Data Factory under the Data + Analytics marketplace segment. Create an instance of Azure Data Factory as shown in the figure below.
After creating the Data Factory instance, you will now see an option called Copy Data (Preview). Click on it to launch the Copy Data wizard.
The first step in the wizard is to set the properties like name and the schedule configuration, whether you want to run it just once or create a job that runs on a regular interval.
After configuring the schedule the next step is to define the data source. Pick a connection from the available list of stores. In this example we selected the Azure Blob Storage.
After selecting the data source connection, the wizard will direct you to the connector specific steps. In this example, it will prompt you to select the folders / files from where you need the data copied.
You can also then provide additional properties to select specific content from the folder / file like the text format, delimiter etc.
You can preview the data and then set the destination data source to where the data will get copied at a regular interval. In the destination, as well, you can specify the properties to merge or append content. Once set, review the summary and then save to complete the wizard and it will get triggered based on the schedule.