Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


 
 
Posted by Sandeep Chanda on February 8, 2016

Power BI online presents powerful analytical capabilities managed by the scale and performance of a cloud infrastructure. The ability to host visualizations that can be connected to a multitude of data sources has made Power BI quite popular and resulted in third-party hosted services being developed to provide data from a large variety of cloud platforms. Power BI now has its own marketplace for such services. Not so long ago, the Visual Studio team also published a connector for fetching data from Visual Studio Online accounts and created a default dashboard for analysing the progress of a team. Together with the connector for Application Insights, the platform for synthetic monitoring of an application, Power BI can provide the ability to uncover deep insights that wasn't possible before in Application Lifecycle Management (ALM).

To create an analytical dashboard on the project metrics, first login to your Power BI account and then click on "Get Data". It provides several options to fetch data from different sources. There is an option to connect to online services.

Click "Get" under the Services tab. You will be redirected to the marketplace services page that contains various options for connecting to different providers. This is where you will find the option to connect to Visual Studio Online using the Visual Studio Online connector. Click on it.

You will be prompted to provide the Visual Studio Online account name and the Project Name. You may specify an "*" under the project name to fetch and analyse data from all projects, but that is not ideal. You should specify the name of the project to fetch data specific to that project only. Next you will be prompted to authenticate. If the Visual Studio Online and Power BI are not part of the same subscription, then you can authenticate using OAuth 2.0.

Currently OAuth 2.0 is the only supported authentication protocol. Once connected, Power BI will fetch project data and create a default dashboard to showcase some of the project metrics such as burn down by story points, active bug count, etc., that you can analyse to determine the health of the project. A connector is also provided for Application Insights, and you can use it to fetch synthetic monitoring data around the application being watched. Together with the project metrics, you can drive powerful ALM insights on the application under development.


Posted by Sandeep Chanda on January 29, 2016

The @WalmartLabs team was founded by Walmart eCommerce to provide a sustainable rate of innovation, given the competition. The team adopted a DevOps culture and migrated the ecommerce platform to cloud. With continuous application lifecycle management (ALM) in vision, the group acquired OneOps — the platform to accelerate DevOps through continuous ALM of cloud workload.

Most recently, the group took a step forward in open sourcing the platform for the leverage of the community at large. This is a huge step, given that OneOps has integration hooks with most of the major cloud providers such as OpenStack, RackSpace, Azure, and Amazon. This is not surprising given that @WalmartLabs is not new to open source. They have contributed to some wonderful technologies for the community like Mupd8 — and hapi — and have been actively contributing to React.js as well.

OneOps not only has integration hooks for all major cloud providers, but can also allow developers to code deployments in a hybrid or a multi-cloud environment. With OneOps, the Walmart eCommerce team is able to run close to 1000 deployments a day. Developer communities can now look forward to automatically managing the lifecycle of an application post deployment. It can take care of scaling, and repairing as needed. It is also a one stop shop for porting applications from one environment to another. Applications or environments built in one environment (Azure, for example) can be easily ported to another (such as AWS).

Setting up OneOps is easy. If you have an AWS account, it is available as a public AMI. Alternately there is a Vagrant image to setup OneOps. The Vagrant project can be checked with the following command:

$ git clone https://github.com/oneops/setup
$ cd setup/vagrant
$ vagrant up 

Once setup, you can monitor the build process in Jenkins on the URI http://localhost:3003.

After installing OneOps, you can login to the console using your account, and then create an Organization profile. The organization profile bootstraps with suitable default parameters. After creating an organization, you can go to the Cloud tab to select environments to configure and deploy. The following figure illustrates selecting an Azure environment.

OneOps provides a three-phase configuration. You have the Design phase, where you create a Platform to configure the building blocks for your deployment from existing packs. Then you move to the Transition phase, where you define the environment variables for targeted deployment. Finally you move to the Operate phase, where actual instances are created post successful deployment, and you can monitor the run.


Posted by Gigi Sayfan on January 27, 2016

It is a common advise to learn from other people successes and mistakes. Go through testimonies, post-mortem analyses, books and articles and they all say the same. But, in the dynamic environment of software development you have to be very careful. It is often easy to observe that, given all other things being equal, A is clearly better than B. The only problem is that all other things are never equal.

This applies on multiple levels. Maybe you want to choose a programming language, a web framework or a configuration management tool. Maybe you want to incorporate a new development process or performance review. Maybe you try to figure out how many people you need to hire and what skills and experience should you shoot for. All of these decisions will have to take into account the current state of affairs and your specific situation. Suppose you read that some startup started using the Rust programming language and within two weeks improved performance 20X. That means nothing. There are so many variables. How bad was their original code, was the performance issue isolated to a single spot, was a Rust wizard on the team? Or maybe you read about a company about your size that tried to switch from waterfall to an agile process and failed miserably. Does that mean your company will fail too? What's the culture of the other company, how was the new process introduced, was higher management committed?

What's the answer then? How can you decide what to do if you can't learn from other people? Very often, the outcome doesn't depend so much on the decision, but more about the commitment and hard work going into the execution. Gather a reasonable amount of information about the different options (don't start a three month study to decide which spell checker you should use). Consult with people you trust and know something both about the subject matter and about your situation, ideally from people inside your organization.

Make sure to involve all the relevant stakeholders and secure their support. But, form your own opinion don't just trust some supposed expert. Then just make a decision and run with it. The bigger the decision or the impact, you should consider more seriously the risk of making the wrong decision and what's the cost of pivoting later. If it turns out your decision was wrong, you're now an expert and should know exactly what went wrong and how to fix it.


Posted by Sandeep Chanda on January 22, 2016

R is the most widely used programming language in the world of data science and heavily used for statistical modelling and predictive analytics. The popularity of R is driving many commercial big data and analytics providers to not only provide first class support for R, but also create software and services around R. Microsoft is not far behind. Months after its acquisition of Revolution Analytics, the company leading the commercial software and services development around R, Microsoft is now ready with R Server. Microsoft R Server is an enterprise scale analytics platform supporting a range of machine learning capabilities based on the R language. It supports all stages of analytics viz. explore, analyse, model and visualize. It can run R scripts and CRAN packages.

In addition, it overcomes the limitations of R open source by supporting parallel processing, thereby allowing a multi-fold increase in the analytical capabilities. Microsoft R Server has support for Hadoop, thereby allowing developers to distribute processing of R data models across Hadoop clusters. It also has support for Teradata. Interests on cloud are also taken care. The Data Science Virtual Machine will now come pre-built with R Server Developer Edition. You can now leverage the scale of Azure to run your R data models. For Windows, R Server ships as R services in SQL Server 2016. While currently in CTP you can install the advanced analytics extensions during the installation of SQL Server 2016 to use a new service called the SQL Server Launchpad and integrate with Microsoft R Open using standard T-SQL statements. To enable R integration then, you can run the sp_configure command and give permissions to a user to run R scripts:

sp_configure 'external scripts enabled', 1
reconfigure
GO
alter role db_rrerole add member [name]; 

You can then connect using your IDE like R Studio to develop and run R code. Microsoft will also shortly launch R tools for Visual Studio (RTVS), and you will be able to run R from within Visual Studio.

With enterprises embracing R and providing solutions for commercial use, it is only a matter of time before developers fully embrace this language for enterprise scale data analysis.


Posted by Sandeep Chanda on January 12, 2016

In the previous two posts (see Part 1 and Part 2), we compared the two most popular cloud platforms, Microsoft's Azure and Amazon's AWS for their offerings in the end-to-end ecosystem of data analytics, both large scale and real time.

In this final post, will compare Azure's Data Factory and an equivalent offering from AWS in the form of AWS Data Pipeline. Both are fairly similar in their abilities and offerings, however, while AWS pitches the Data Pipeline as a platform for data migration between different AWS compute and storage services, and also between on premise and AWS instances, Azure's pitch for Data Factory is more as an integration service for orchestrating and automating the movement and transformation of data.

In terms of quality attributes, both services are very capable in terms of scalability, reliability, flexibility, and of course, cost of operations. Data Pipeline is backed by the highly available and fault tolerant infrastructure of AWS and hence is extremely reliable. It is also very easy to create a pipeline using the drag and drop console in AWS. It offers a host of features, such as scheduling, dependency tracking, and error handling. Pipelines can not only be run serially, but also in parallel. The usage is also very transparent in terms of moderating control over the computational resources assigned to execute the business logic. Azure Data Factory, on the other hand, provides features such as visualizing the data lineage.

In terms of pricing, Azure charges by the frequency of activities and where they run. A low frequency activity in cloud is charged at $.60 and the same activity on premise is charged $1.50. Similarly the high frequency activities have higher charges. Note that you are also charged for data movement separately for cloud and on premise. In addition, pipelines that are left inactive are also charged.


Posted by Sandeep Chanda on January 1, 2016

In the first part of this series comparing the competing analytics platform offerings from Microsoft Azure and Amazon AWS, we explored Azure Analytics Platform System and AWS Redshift. In this post, we will talk about comparing some of the other products in the ecosystem of analytics.

Microsoft Azure also offers Stream Analytics, that's again a turnkey proprietary solution from Microsoft for cost effective real-time processing of events. With Stream Analytics, you can easily set up a variety of devices, sensors, web applications, social media and infrastructure to stream data and then perform real-time analytical computations on them. Stream Analytics is a powerful and effective platform for designing IoT solutions. It allows streaming millions of events per second and provides mission critical reliability. It also provides a familiar SQL based language support for rapid development using your existing SQL knowledge.

A competing offering from AWS is Kinesis Streams, however it is geared more towards application insights than devices and sensors. Stream Analytics actually seems to be competing against Apache Storm on Azure hosted as HDInsight. Both are offered as PaaS and support processing of virtually millions of events per second. A key difference, however, is that Stream Analytics deploy as monitoring jobs, while Storm on HDInsight deploys as clusters of monitoring jobs, hosting multiple stream jobs or other workloads. Another volumetric aspect to consider is that Stream Analytics is turnkey, whereas Storm on HDInsight allows lot of custom connectors and is extensible.

There are pricing considerations to make as well while making a choice between these platforms. In Stream Analytics, pricing is by the volume of data processed and number of streaming units, while in HDInsight, it is charged by the clusters irrespective of jobs that may or may not be running. This post by Jeff Stokes details the differences.

(See also, Part 3 of this series)


Posted by Gigi Sayfan on December 28, 2015

Python 3 has been around for years. I actually wrote a series of articles for DevX on Python 3 called "A Developer's guide to Python 3.0". The first article was published on July 29 2009, more than 6 years ago!

Python 3 adoption has been slow, to say the least. There are many reasons for this, such as slow porting of critical libraries and not enough motivation for run-of-the mill developers who don't particularly care about Unicode.

But, the tide may be turning. The library porting scene looks much better. Check out the Python 3 Wall of Super Powers for an up to date status of the most popular Python libraries. The clincher may be the cool new asynchronous support added to recent versions of Python 3. Check out "Asynchronous I/O, event loop, coroutines and tasks".

In Python 3.5 there will be dedicated async/await syntax inspired by C#. This battle-tested programming model proved itself as robust, expressive and developer-friendly. Check out PEP-492 for all the gory details. Python 3 may have finally arrived!


Posted by Gigi Sayfan on December 23, 2015

Configuration files (a.k.a config) are files that contain different options that programs can read and let you control the operation of the program without making code changes. Back in the 1990s Windows programs used the INI format. The file contained sections and each section contained key value pairs. The INI format was extremely simple to produce and consume by people and computers.

When XML (eXtensible Markup Language) became popular, many programs started using it as a configuration format especially in the Java world. Ant and Maven are prominent examples, as well as many Android files. Then, XML became uncool and everybody started writing web applications and switched to JSON as the preferred format. All the while, another simple format slowly but surely made strides — YAML (Yaml Ain't Markup Language) — a very human-readable and very machine-readable format.

YAML is one of my favorite formats and one of the reasons I picked Ansible as my go-to configuration and orchestration framework. Then, recently a new contender emerged — TOML (Tom's Obvious, Minimal Language). TOML appears, at first glance, to be just like the good old INI format, but offers a much more rigorous spec while maintaining the simplicity. Several prominent projects use TOML, such as Rust's Cargo package manager and InfluxDB. My guess is that a combination of JSON, YAML and TOML will dominate the configuration file landscape for a spell.


Posted by Sandeep Chanda on December 22, 2015

With the leading cloud providers now racing against time to complete their offerings in the analytics space, enterprises are spoilt for choice. Not only are there many offerings to choose from between the leading providers such as Amazon Web Services and Microsoft Azure, but in some cases, the stack of offerings within one provider itself seems to be competing and it often gets confusing to decide what really would suit your enterprise needs for real-time and predictive analytics.

Microsoft Azure's analytics offerings include, Analytics Platform System, Stream Analytics, Cortana Analytics, Data Factory, Data Lake Storage, HDInsight, and Power BI for visualization. Amazon is featuring its analytical suite with Kinesis (yet to be released), Redshift for storage and QuickSight for visualization.

The Azure Analytics Platform System is a turnkey big data analytics solution from Microsoft that leverages its massive parallel processing data warehouse technology  —  SQL Server Parallel Data Warehouse. It works in conjunction with its Apache Hadoop distribution platform HDInsight. So, as an enterprise, if you are looking at a turnkey solution for processing massive volumes of data, then this could be a choice of the platform. It is based on a PolyBase infrastructure, hence allows seamless integration of relational data warehouse and Hadoop. It also offers significant performance improvements and has the least cost per terabyte of data. It can also scale linearly to 6 petabytes of user data. Azure APS has been recently rebranded as such from its previous avatar as Parallel Data Warehouse. It is significant competition for Teradata and Oracle, but has limitations in the form of availability of instances for you to configure and try. You need to reach out to Microsoft account executives to get access to APS.

With Azure APS, Microsoft scores over Amazon, since Amazon's only competition in this space is Redshift which doesn't compare in many ways directly with APS. It compares better with Azure SQL Data Warehouse. Another area where Azure betters the competition is in its warehousing platform is the ability to scale memory and compute independently. That said, Amazon has a lead in having done this much ahead of Azure, and market recognition of APS is yet to happen!

In the next post, will compare the offerings in other areas of analytics.

(See also, Part 3 of this series)


Posted by Gigi Sayfan on December 9, 2015

Unit testing is a well established practice to ensure your code actually does what it's supposed to do. If you have good test coverage — most of your code has tests that cover it — then you'll be much more confident about refactoring and making changes without worrying that you're going to break something. But, unit tests are not always simple to write. Unit test are most useful if they run fast and don't take too long to write and maintain.

The best case is if you have a pure function that takes some inputs and always returns the same output for those inputs. All you have to do is come up with several representative inputs and verify in the test that your function indeed returns the expected output. But, often your code will depend on other components, read something from a file or a database, log some information and may even have some side effects like sending email.

How would you test such code? The answer is mocking. You replace the dependency on the external component or the side effect with a fake that presents that same interface, but typically does nothing except storing the calls and the arguments and returning a canned response.

For example, consider the following function:

def send_promotion(customer_emails, promotion_message):
    for email in customer_emails:
        mail_sender.send_mail(email, promotion_message)

You definitely don't want to email all your customers whenever you run a test. The solution is to replace the send_mail() method of the mail_sender module with a mock that on every call just stores the fact that the code under test tried to send an email to customer X with message M. Then, in your test you can first replace mail_sender.send_mail() with your mock send_mail() function, then call the target function send_promotion() with some customer_emails and a promotion message. Once send_promotion() returns you can verify that indeed your mock send_mail()function was called for each customer in the list and that the test of the message was indeed the promotion message.

This is a very simple example, of course. In practice the send_promotion() function may perform some formatting to the actual email sent. The list of customers may be pruned based on various criteria and you may need to write multiple tests to cover all the possibilities.


Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date