Posted by Sandeep Chanda
on January 29, 2016
The @WalmartLabs team was founded by Walmart eCommerce to provide a sustainable rate of innovation, given the competition. The team adopted a DevOps culture and migrated the ecommerce platform to cloud. With continuous application lifecycle management (ALM) in vision, the group acquired OneOps — the platform to accelerate DevOps through continuous ALM of cloud workload.
Most recently, the group took a step forward in open sourcing the platform for the leverage of the community at large. This is a huge step, given that OneOps has integration hooks with most of the major cloud providers such as OpenStack, RackSpace, Azure, and Amazon. This is not surprising given that @WalmartLabs is not new to open source. They have contributed to some wonderful technologies for the community like Mupd8 — and hapi — and have been actively contributing to React.js as well.
OneOps not only has integration hooks for all major cloud providers, but can also allow developers to code deployments in a hybrid or a multi-cloud environment. With OneOps, the Walmart eCommerce team is able to run close to 1000 deployments a day. Developer communities can now look forward to automatically managing the lifecycle of an application post deployment. It can take care of scaling, and repairing as needed. It is also a one stop shop for porting applications from one environment to another. Applications or environments built in one environment (Azure, for example) can be easily ported to another (such as AWS).
Setting up OneOps is easy. If you have an AWS account, it is available as a public AMI. Alternately there is a Vagrant image to setup OneOps. The Vagrant project can be checked with the following command:
$ git clone https://github.com/oneops/setup
$ cd setup/vagrant
$ vagrant up
Once setup, you can monitor the build process in Jenkins on the URI
After installing OneOps, you can login to the console using your account, and then create an Organization profile. The organization profile bootstraps with suitable default parameters. After creating an organization, you can go to the Cloud tab to select environments to configure and deploy. The following figure illustrates selecting an Azure environment.
OneOps provides a three-phase configuration. You have the Design phase, where you create a Platform to configure the building blocks for your deployment from existing packs. Then you move to the Transition phase, where you define the environment variables for targeted deployment. Finally you move to the Operate phase, where actual instances are created post successful deployment, and you can monitor the run.
Posted by Gigi Sayfan
on January 27, 2016
It is a common advise to learn from other people successes and mistakes. Go through testimonies, post-mortem analyses, books and articles and they all say the same. But, in the dynamic environment of software development you have to be very careful. It is often easy to observe that, given all other things being equal, A is clearly better than B. The only problem is that all other things are never equal.
This applies on multiple levels. Maybe you want to choose a programming language, a web framework or a configuration management tool. Maybe you want to incorporate a new development process or performance review. Maybe you try to figure out how many people you need to hire and what skills and experience should you shoot for. All of these decisions will have to take into account the current state of affairs and your specific situation. Suppose you read that some startup started using the Rust programming language and within two weeks improved performance 20X. That means nothing. There are so many variables. How bad was their original code, was the performance issue isolated to a single spot, was a Rust wizard on the team? Or maybe you read about a company about your size that tried to switch from waterfall to an agile process and failed miserably. Does that mean your company will fail too? What's the culture of the other company, how was the new process introduced, was higher management committed?
What's the answer then? How can you decide what to do if you can't learn from other people? Very often, the outcome doesn't depend so much on the decision, but more about the commitment and hard work going into the execution. Gather a reasonable amount of information about the different options (don't start a three month study to decide which spell checker you should use). Consult with people you trust and know something both about the subject matter and about your situation, ideally from people inside your organization.
Make sure to involve all the relevant stakeholders and secure their support. But, form your own opinion don't just trust some supposed expert. Then just make a decision and run with it. The bigger the decision or the impact, you should consider more seriously the risk of making the wrong decision and what's the cost of pivoting later. If it turns out your decision was wrong, you're now an expert and should know exactly what went wrong and how to fix it.
Posted by Sandeep Chanda
on January 22, 2016
R is the most widely used programming language in the world of data science and heavily used for statistical modelling and predictive analytics. The popularity of R is driving many commercial big data and analytics providers to not only provide first class support for R, but also create software and services around R. Microsoft is not far behind. Months after its acquisition of Revolution Analytics, the company leading the commercial software and services development around R, Microsoft is now ready with R Server. Microsoft R Server is an enterprise scale analytics platform supporting a range of machine learning capabilities based on the R language. It supports all stages of analytics viz. explore, analyse, model and visualize. It can run R scripts and CRAN packages.
In addition, it overcomes the limitations of R open source by supporting parallel processing, thereby allowing a multi-fold increase in the analytical capabilities. Microsoft R Server has support for Hadoop, thereby allowing developers to distribute processing of R data models across Hadoop clusters. It also has support for Teradata. Interests on cloud are also taken care. The Data Science Virtual Machine will now come pre-built with R Server Developer Edition. You can now leverage the scale of Azure to run your R data models. For Windows, R Server ships as R services in SQL Server 2016. While currently in CTP you can install the advanced analytics extensions during the installation of SQL Server 2016 to use a new service called the SQL Server Launchpad and integrate with Microsoft R Open using standard T-SQL statements. To enable R integration then, you can run the
sp_configure command and give permissions to a user to run R scripts:
sp_configure 'external scripts enabled', 1
alter role db_rrerole add member [name];
You can then connect using your IDE like R Studio to develop and run R code. Microsoft will also shortly launch R tools for Visual Studio (RTVS), and you will be able to run R from within Visual Studio.
With enterprises embracing R and providing solutions for commercial use, it is only a matter of time before developers fully embrace this language for enterprise scale data analysis.
Posted by Gigi Sayfan
on January 21, 2016
Agile methodologies have been used successfully in many big companies, but it is often a challenge. There are many reasons: lack of project sponsorship, prolonged user validation, existing policies, legacy systems with no tests - and most importantly culture and inertia. Given all these obstacles how do you scale Agile processes in a big organization? Very carefully. If you're interested in introducing Agile development practices into a large organization, you can try some of these techniques:
- Show don't tell - Work on a project using Agile methods. Get it done on time and on budget using Agile methods.
- Grow organically and incrementally - If you're a manager it's easy. Start with your team. Try to gain mindshare with your peer managers - for example, when collaborating on a project, suggest the use of Agile methods to coordinate deliverables and handoffs. If you're a developer, try to convince your team members and manager to give it a try.
- Utilize the organizational structure - Treat each team or department as a small Agile entity. If you can, establish well-defined interfaces.
- Be flexible - Be willing to compromise and acknowledge other people's concerns. Try to accommodate as much as possible even if it means you start with a hybrid Agile process. Changing people and their habits is hard. Changing the mindset of veteran people in big companies with established culture is extremely difficult.
Finally, if you are really passionate about Agile practices and everything you've tried has failed, you can always join a company that already follows agile practices, including many companies from the Fortune 2000.
Posted by Gigi Sayfan
on January 13, 2016
- The form of internal documentation appropriate for an organization following agile practices.
- Generating external documentation as an artifact/user story.
The first meaning is typically a combination of code comments and auto-generated documentation. A very common assertion in Agile circles is that unit tests serve as live documentation. Python, for example has a module called doctest in which the documentation of a function may contain live code examples with outputs that can be executed as tests which verify the correctness.
Behavior Driven Development
BDD is putting a lot of emphasis on even specifying the requirements in an executable form via special DSLs (domain specific languages), so the requirements can serve as both tests and live human readable documentation. Auto-generated documentation for public APIs is very common. Public APIs are designed to be used by third party developers who are not familiar with the code (even if it's open source). The documentation must be accurate and in sync with the code.
The second meaning can be considered as just another artifact. But, there are some differences. Typically, when generating external documentation for a system it is centralized. You have to consider the structure and organization and then address the content as a collection of user stories. Unlike code artifacts, external documentation doesn't have automated tests. Documentation testing is an often neglected practice. Which is fairly typical because the documentation itself is often neglected. However, some types of external documentation are critical and must serve contractual or regulatory requirements. In these cases, you must verify that the documentation is correct.
Posted by Sandeep Chanda
on January 12, 2016
In the previous two posts (see Part 1 and Part 2), we compared the two most popular cloud platforms, Microsoft's Azure and Amazon's AWS for their offerings in the end-to-end ecosystem of data analytics, both large scale and real time.
In this final post, will compare Azure's Data Factory and an equivalent offering from AWS in the form of AWS Data Pipeline. Both are fairly similar in their abilities and offerings, however, while AWS pitches the Data Pipeline as a platform for data migration between different AWS compute and storage services, and also between on premise and AWS instances, Azure's pitch for Data Factory is more as an integration service for orchestrating and automating the movement and transformation of data.
In terms of quality attributes, both services are very capable in terms of scalability, reliability, flexibility, and of course, cost of operations. Data Pipeline is backed by the highly available and fault tolerant infrastructure of AWS and hence is extremely reliable. It is also very easy to create a pipeline using the drag and drop console in AWS. It offers a host of features, such as scheduling, dependency tracking, and error handling. Pipelines can not only be run serially, but also in parallel. The usage is also very transparent in terms of moderating control over the computational resources assigned to execute the business logic. Azure Data Factory, on the other hand, provides features such as visualizing the data lineage.
In terms of pricing, Azure charges by the frequency of activities and where they run. A low frequency activity in cloud is charged at $.60 and the same activity on premise is charged $1.50. Similarly the high frequency activities have higher charges. Note that you are also charged for data movement separately for cloud and on premise. In addition, pipelines that are left inactive are also charged.
Posted by Gigi Sayfan
on January 4, 2016
Agile practices have proven themselves time and again for development and evolution of software systems. But, it's not clear if the same agile approach can benefit user-facing aspects such as public APIs, user interface design and user experience. If you change your API constantly, no sane developer will use it. If your user interface design or experience keeps shifting users will get confused and angry that they have to face a new learning curve whenever you decide to make a change. Sometimes, users will be upset even if the changes are demonstrably beneficial, just because of switching costs. Remember users didn't subscribe to your agile thinking and are just interested in using your API/product.
What's the answer then? Do you have to be prescient and come up with the ultimate API and user interface right at the beginning? Not at all. There are several solutions that will allow you to iterate here as well. But, you have to realize that iteration on these aspects should and will be slower and more disciplined.
Possible approaches include A/B testing, keeping the old API/interface available, deprecating previous APIs, backward compatibility, testing rapid changes on groups of users that sign up for beta. In general, the more successful you are, the less free you are to get rid of legacy. Probably the best example is Microsoft which still allows you to run DOS programs on the latest Windows versions and used a variety of approaches to iterate on the Windows desktop experience, including handling the frustration from users whenever a new version of Windows comes out. Windows 10 is a fine response to the harsh criticism Windows 8 endured.
Posted by Sandeep Chanda
on January 1, 2016
In the first part of this series comparing the competing analytics platform offerings from Microsoft Azure and Amazon AWS, we explored Azure Analytics Platform System and AWS Redshift. In this post, we will talk about comparing some of the other products in the ecosystem of analytics.
Microsoft Azure also offers Stream Analytics, that's again a turnkey proprietary solution from Microsoft for cost effective real-time processing of events. With Stream Analytics, you can easily set up a variety of devices, sensors, web applications, social media and infrastructure to stream data and then perform real-time analytical computations on them. Stream Analytics is a powerful and effective platform for designing IoT solutions. It allows streaming millions of events per second and provides mission critical reliability. It also provides a familiar SQL based language support for rapid development using your existing SQL knowledge.
A competing offering from AWS is Kinesis Streams, however it is geared more towards application insights than devices and sensors. Stream Analytics actually seems to be competing against Apache Storm on Azure hosted as HDInsight. Both are offered as PaaS and support processing of virtually millions of events per second. A key difference, however, is that Stream Analytics deploy as monitoring jobs, while Storm on HDInsight deploys as clusters of monitoring jobs, hosting multiple stream jobs or other workloads. Another volumetric aspect to consider is that Stream Analytics is turnkey, whereas Storm on HDInsight allows lot of custom connectors and is extensible.
There are pricing considerations to make as well while making a choice between these platforms. In Stream Analytics, pricing is by the volume of data processed and number of streaming units, while in HDInsight, it is charged by the clusters irrespective of jobs that may or may not be running. This post by Jeff Stokes details the differences.
(See also, Part 3 of this series)
Posted by Gigi Sayfan
on December 28, 2015
Python 3 has been around for years. I actually wrote a series of articles for DevX on Python 3 called "A Developer's guide to Python 3.0". The first article was published on July 29 2009, more than 6 years ago!
Python 3 adoption has been slow, to say the least. There are many reasons for this, such as slow porting of critical libraries and not enough motivation for run-of-the mill developers who don't particularly care about Unicode.
But, the tide may be turning. The library porting scene looks much better. Check out the Python 3 Wall of Super Powers for an up to date status of the most popular Python libraries. The clincher may be the cool new asynchronous support added to recent versions of Python 3. Check out "Asynchronous I/O, event loop, coroutines and tasks".
In Python 3.5 there will be dedicated async/await syntax inspired by C#. This battle-tested programming model proved itself as robust, expressive and developer-friendly. Check out PEP-492 for all the gory details. Python 3 may have finally arrived!
Posted by Gigi Sayfan
on December 23, 2015
Configuration files (a.k.a config) are files that contain different options that programs can read and let you control the operation of the program without making code changes. Back in the 1990s Windows programs used the INI format. The file contained sections and each section contained key value pairs. The INI format was extremely simple to produce and consume by people and computers.
When XML (eXtensible Markup Language) became popular, many programs started using it as a configuration format especially in the Java world. Ant and Maven are prominent examples, as well as many Android files. Then, XML became uncool and everybody started writing web applications and switched to JSON as the preferred format. All the while, another simple format slowly but surely made strides — YAML (Yaml Ain't Markup Language) — a very human-readable and very machine-readable format.
YAML is one of my favorite formats and one of the reasons I picked Ansible as my go-to configuration and orchestration framework. Then, recently a new contender emerged — TOML (Tom's Obvious, Minimal Language). TOML appears, at first glance, to be just like the good old INI format, but offers a much more rigorous spec while maintaining the simplicity. Several prominent projects use TOML, such as Rust's Cargo package manager and InfluxDB. My guess is that a combination of JSON, YAML and TOML will dominate the configuration file landscape for a spell.