Posted by Sandeep Chanda
on April 28, 2016
Alberto Brandolini has been evangelizing the idea of EventStorming for a while now. It is a powerful workshop format to breakdown complex business problems pertaining to real world scenarios. The idea took its shape from the Event Sourcing implementation style laid out by Domain Driven Design. The outcome of the workshop format produces a model perfectly aligned with the idea of DDD and lets you identify aggregate and context boundaries fairly quickly. The approach also leverages easy-to-use notations and doesn't require UML, that in itself, might become a deterrent for some participants of a workshop who are not so familiar with UML notations.
The core idea of EventStorming is to make the workshop more engaging and evoke thought provoking responses from the participants. A lot of times discovery is superficial and figuring out the details are deferred for later. On the contrary, EventStorming allows participants to ask some very deep questions about the business problem that were very likely playing in their sub-conscious minds. It creates an atmosphere where the right questions can arise.
A core theme of this approach is unlimited modeling space. Modeling complex business problems is often constrained by space limitations (mostly whiteboard), but the approach allows anything to be leveraged as a platform where the problem can be modeled. You may pick anything that can come in handy and help you get rid of the space limitations.
Another core theme of this approach is the focus on Domain Events. Domain Events represent meaningful actions in the domain with suitable predecessors and successors. Placing events in a timeline on a surface allows people to visualize upstream and downstream activities and model the flow easily in their minds. Domain events are further annotated with user actions that are represented as Commands. You can also color code the representation to distinguish between user actions and system commands.
The next aspect to investigate is Aggregates. Aggregates here should represent a section of the system that receives the commands and decides on their execution. Aggregates produce Domain Events.
While a Domain Event is key to this exploration technique, along the way you are also encouraged and motivated to explore Subdomains, Bounded Context and User Personas. Subsequently, you also look at Acceptance Tests to remove any amount of ambiguity arising out of edge-case scenarios.
Posted by Gigi Sayfan
on April 27, 2016
In recent years the database scene has been boiling. New databases for special purposes seem to emerge every day and for a good reason. Data processing needs have become more and more specialized and with scale you often need to store your data in a way that reflects its structure in order to support proper queries. The default of storing everything in a relational database is often not the right solution. Graphs, which are a superset of trees and hierarchies are a very natural way to represent many real world concepts.
Facebook and its social graph brought it to center stage, but graphs, trees and hierarchies were always important. It is possible to model graphs pretty easily with a relational database because a graph is really just vertices connected by edges, that map very well to the entity-relational model. But, performance is a different story at scale and with deep hierarchies.
Many graph databases exist today with different levels of maturity. One of the most interesting developments in this domain is the TinkerPop Apache incubator project, which is a graph computing framework with active industry contribution. Checkout the getting started tutorial here to get familiarized with the terrain.
Other interesting projects include DataStax (Cassandra) acquiring TitanDB and incorporating it and of course Facebook's GraphQL. And, there is always Neo4J that has a mature enterprise-ready solution and an impressive list of customers.
The exciting part is how to combine graph databases with other data systems and how to construct viable data intensive systems that scale well to address new data challenges such as the coming of the IoT era where sensors will generate unprecedented amounts of data.
Posted by Sandeep Chanda
on April 20, 2016
At the recently concluded Build Conference, the Microsoft Azure team announced the preview of Functions, an app service that allows you to execute code on demand. Azure Functions is event-driven and allows serverless execution of code triggered as a result of an event or a coordinated set of events. It allows you to extend the application platform capabilities of Azure without having to worry about compute or storage. The events don’t have to only be occurring in Azure, but in virtually any environment including on-premise systems. You can also make Functions connected to be part of a data processing or messaging pipeline, thereby executing code asynchronously as part of a workflow. In addition, Azure Functions Apps can scale on demand, allowing you to pay for only what you use.
Azure Functions offers support for a myriad of languages, such as PHP, C# and most of the scripting languages (Bash and PowerShell, for example). You can also upload and trigger pre-compiled code and support dependencies using NuGet and NPM. Azure Functions can also run in a secure setting by making them part of an app service environment configured for a private network. Out of the box, it supports OAuth providers such as Azure Active Directory, Facebook, Google, etc.
To create a Functions app login to your Azure portal and search for Functions in the Marketplace search bar.
Select the Function App service, provide a name, select a resource group and create the app.
Once you create a Function app, it also creates an AppInsights monitoring app that attaches to monitor the health of the Functions app. After the Function app is deployed, you can navigate to the Function App console from the dashboard. The console has two windows. The Code window (an editor with a code highlighter) is where you can put your script and then you can verify the outcome under the Logs window.
You can then make the Function integrate with an event driven workflow as illustrated in the figure below:
You can configure an input and an output for the trigger in the Integrate tab. In this example, a storage queue is set as the default trigger but you can set it to be a Schedule, a Webhook, or Push Notifications amongst others.
There are also a bunch of pre-defined templates you can use to start creating your function from scratch.
Additionally, there are several advanced settings available for you to configure, for example setting up authentication, authorization, integrating with a source control for continuous integration, enabling CORS and providing API definitions to allow clients to easily call them.
In a world heavily dominated by the likes of Amazon, Google, and IBM, it is a good move for Microsoft to release Azure Functions, providing more options for developers and enterprises to choose and extend their PaaS implementation.
Posted by Gigi Sayfan
on April 18, 2016
Many organizations have a hard time making decisions about adopting new technologies. There are two prominent camps. The early adopters will jump at any new technology and push towards integrating it or replacing last week's new tech. Then, you have the careful crowd that just doesn't want to rock the boat. As long as everything works they are not interested in switching to something better. Between those two extremes of risk taking and risk aversion are the rest of us, trying to get by and take advantage of new development, but not at the cost of bringing the whole system down. So, what is a rational process for adopting new technologies?
There are several principles that will serve you well. Newer technologies, even if demonstrably better than incumbent alternatives, take time to reach maturity. The more complicated and crucial — the longer it takes to reach maturity. For example, a new library for parsing text may be fine to switch to after playing with it a little bit and verifying it works well. A totally new distributed NoSQL database is a different story and will probably require several years until you should put your company's fate there. The crucial element then is timing.
If a new technology has been battle tested long enough, has an active community and demonstrates clear benefits on a more mature alternative you may consider switching to it even for important projects. A good example is the Flask vs. Django debate. At this point, Flask has crossed the critical threshold as far as I'm concerned. When you do decide to adopt a new technology, you should do it gradually (maybe start with a non-critical small project) and have a plan B (probably stick with what you have) in case you discover unforeseen issues.
Posted by Gigi Sayfan
on April 14, 2016
Before the Agile Manifesto there was Extreme Programming. I discovered it through the http://c2.com wiki — the first wiki and was instantly impressed. Extreme programming is indeed the extreme. There are values such as communication, simplicity, feedback and courage. Later respect was added. But, courage is what caught my attention. Too often, I see people operate out of fear, which is simply debilitating. Even the "move fast and break things" movement is not courageous because they hedge their bets and concede that they'll break things.
Extreme Programming is different. It doesn't make excuses. It doesn't hide behind trade-offs such as, if we move fast, things will get messy. No, the Extreme Programming stance was different. What happens if we take all the best practices and turn the knob to 11?
There are many practices and you can read about them detail here. Some of them were revolutionary at the time. How did extreme programming fare? It depends. On one hand it ushered the era of Agile and, as such, was revolutionary. But, on the other hand it was too strict to follow exactly. The Chrysler Comprehensive Compensation (C3) project that was the real world test bed for Extreme Programming was cancelled after 7 years and a plethora of other Agile methods exploded on the scene. Many of the original extreme programming practitioners such as Ward Cunningham, Kent Beck and Martin Fowler became well-known thought leaders, published successful books and continued to advance the state of software development. I definitely learned a lot about software development from reading and trying to practice variants of Extreme Programming.
Posted by Sandeep Chanda
on April 11, 2016
The Azure Batch service is now available for general use. It is a fully managed service hosted in Azure that lets you configure scheduled jobs and supports performing compute resource management for other cloud based services. It is a turnkey solution for running large scale High Performance Computing (HPC) applications in parallel, leveraging the cloud scale. Note that it is a platform service that allows you to run resource intensive operations on a managed collection of virtual machines and can scale automatically depending on the need.
There are several use-cases for Azure Batch including scientific computations such as Monte Carlo simulations, financial modelling, media transcoding, and a more common scenario with automated testing of applications. The Azure Batch service works very well with scenarios that are intrinsically parallel in nature. Scenarios where a workload can be broken into multiple tasks that can run in parallel are the best possible use cases for Azure Batch service. Not only can the managed service run multiple workloads, it can also be configured for parallel calculations with a reduce step in the end.
To configure a new Batch service, login to your Azure portal and then find the Batch managed service from the Azure Marketplace by typing Batch in the search window.
Specify a name for the Batch service and configure the resource group and the storage account:
Once the service is deployed you will see the dashboard to configure the applications and jobs as illustrated in the following figure:
Now that you have successfully created the Batch Account, you can use the batch service most commonly in two ways:
- Use the Batch .NET API to programmatically schedule the job
- Use it part of a larger workflow like Azure Data Factory
In addition, there is an Azure Batch Explorer sample application available in GitHub that you can run to browse and manage the resources in your Batch Account.
You can use the Batch API to perform tasks such as creating a Job, provisioning a Schedule and adding Tasks to a Job. For example, you can create a console application that reads from a file and performs multiple parallel operations based on the content of the file. You can store the application in Azure Blob Storage and then configure the Job to run the application on a regular interval.
Posted by Gigi Sayfan
on April 5, 2016
When building, developing and troubleshooting complex systems everybody agrees that modularity is the way to go. Different parts or components need to interact only through well-defined interfaces. This way the complexity of each component can be reduced to its inputs and outputs. That's the theory. In practice, this is extremely hard to achieve. Implementation details are sometimes hard to contain. This is where black box and white box testing can come in handy, depending on the situation.
Consider an interface that expects a JSON file as input. If you don't specify exactly in the schema, then the format of the file can change and break interactions that worked previously. But, even if you put in the work and discipline and properly separated all the concerns and rigorously defined all the interfaces, you're still not in the clear. There are two big problems:
- If your system is complicated enough then new development will often require changing interfaces and contracts. When that happens you still need to dive in and understand the internal implementation.
- When something goes wrong, you'll have to troubleshoot the components and follow the breadcrumbs. There is no escaping the complexity when debugging. Under some circumstances, very well factored systems that abstract as much as possible are more difficult to debug because you don't have access to a lot of context.
But, black box and white box testing is not just about the system. It's also a property of people working with the system. Some people thrive on the black box view and keep an abstract view of the components as black boxes and their interactions. Other people, must see a concrete implementation and understand how a component ticks before they can climb up the abstraction ladder and consider the whole system.
There are good arguments for both views and while working with most complicated systems you should be able to wear both hats at different times.
Posted by Sandeep Chanda
on March 24, 2016
Azure Data Factory offers capabilities to orchestrate data movement services that can scale using Azure Infrastructure. Not just that, you can visualize the data lineage connected to both on premise and cloud data sources and monitor the health of the pipeline as well. A few weeks back the Azure team published a Code-free Copy tool for Azure Data Factory that allows hassle free configuration and management of data pipelines without having to write any script using a declarative designer. A simple wizard allows you to explore data sources between various cloud offerings like SQL Azure, Azure Storage, Azure SQL Data Warehouse etc., as well as your local SQL Server database. You can also preview the data, apply expressions to validate and perform schema mapping for simple transformations. You get the scalability benefits of Azure and hence you can move hundreds and thousands of files and rows of data efficiently.
To start, first login to your Azure portal and search for Data Factory under the Data + Analytics marketplace segment. Create an instance of Azure Data Factory as shown in the figure below.
After creating the Data Factory instance, you will now see an option called Copy Data (Preview). Click on it to launch the Copy Data wizard.
The first step in the wizard is to set the properties like name and the schedule configuration, whether you want to run it just once or create a job that runs on a regular interval.
After configuring the schedule the next step is to define the data source. Pick a connection from the available list of stores. In this example we selected the Azure Blob Storage.
After selecting the data source connection, the wizard will direct you to the connector specific steps. In this example, it will prompt you to select the folders / files from where you need the data copied.
You can also then provide additional properties to select specific content from the folder / file like the text format, delimiter etc.
You can preview the data and then set the destination data source to where the data will get copied at a regular interval. In the destination, as well, you can specify the properties to merge or append content. Once set, review the summary and then save to complete the wizard and it will get triggered based on the schedule.
Posted by Gigi Sayfan
on March 21, 2016
Agile methods can accelerate development significantly compared with traditional methods. This is true in particular for programming where fast edit-test-deploy cycles are possible even on a massive scale (look at Google or Facebook for good examples). However, the database is often another world. If you keep a lot of data in your database (and who doesn't?) then changes to the database schema may not be as swift as regular code changes.
There are several reasons for that. The data is often the most critical asset of an organization. Losing/corrupting the data might lead to the demise of the company. Imagine a company permanently losing all their user information, including billing. This means that any such change should be taken with extreme caution and with lots of measures to detect and revert any change. This is not easy to do safely. Another reason is that the database if often at the core of many applications and some database changes lead to cascading changes in those applications.
Sometimes, schema changes, such as splitting one table into two tables, require data migration from the old to a new schema. If you have a lot of data that may be a long process that takes days or weeks because you still need to support the live system while the migration is ongoing.
So, is the database doomed to be this thorn in the side of Agile development? Not necessarily. There are some strategies you can employ to minimize both the risk and the effort to make database changes. First, use multiple databases. The data can often be split into multiple independent data sets. Any change to one of them will not impact the others. Another approach is use schema-less databases, where the application deals with the variety of implicit formats--at least during migration periods. Finally, you can get really good at managing database schema changes and build the mechanisms to support it, migrate data and ensure you have working recovery mechanisms. This takes a lot of work, but it is worth it if you work on innovative development and need to evolve your database schema.
Posted by Gigi Sayfan
on March 15, 2016
Flow is a mental state where you're in "The Zone." Time flies by, everything just works and you're totally focused on the task at hand and apply yourself fully. You can see basketball players attain it sometimes where they just can't miss a shot. But, you can also observe and experience it programming. It's those amazing sessions where in several hours or a day you churn out something that typically takes weeks.
These 100x productivity periods are highly coveted. Not just because of how much you get done, but also because it is one of the best feelings in the world when you totally realize your potential. To get into flow you need an obstacle free environment--no distractions, all available resources at hand. Everything must be streamlined.
In software development, Agile methods provide the best breeding ground for getting into flow. Your tasks are broken down into small enough chunks that you can comprehend them, the automated tests, continuous integration and deployment smooth the process of verifying code. Pair programming is arguably another fantastic technique to get into flow. The driver is just programming, while the navigator deals with all the stuff that typically would distract the driver if they were programming alone. For example, looking up documentation, checking the status of the latest test run and scouring StackOverflow for that helpful snippet. The most important part might be that you can't just go HackerNews binging in the middle of a pair programming session. When you have two people working together, one keeps the other honest as far as focus goes. I recommend that you explicitly experiment and find the best way for you to achieve flow. It is well worth it.