Posted by Sandeep Chanda
on November 13, 2015
Just a few days back, Google, in a not so surprising move, announced open sourcing its machine learning platform called TensorFlow. TensorFlow employs the concept of neural computation using Data Flow Graphs. Data flow graphs are represented as nodes and edges where nodes represent a mathematical operation or an endpoint for data ingestion, sending output, or reading and writing persistent variables. Edges, on the other hand, describe the input output relationship between the nodes. Typically a node represents a computational system and is executed asynchronously in parallel with others.
The primary reason Google quotes for making this an open source platform is to inspire more research, not only in the world of unique computational domains, but also research related to deep neural networks. One of the most important architectural aspects of TensorFlow is that it can be run on a multitude of devices such as desktop, server, mobile, you name it — leveraging computing power from all these different devices and in turn allowing it to demonstrate its capabilities as a truly portable machine learning platform.
Asynchronous computing, threads, and queues are first class citizens in TensorFlow allowing it to maximize performance from the available hardware running the compute elements of the data flow graph. It also supports auto-differentiation out of the box. It can automatically compute derivatives based on the available data combined with the objective function and predictive model definition. It is also extremely flexible in terms of supporting custom libraries. If you can express a computation in terms of a data flow graph, you can leverage TensorFlow's capabilities. As simple as that!
Posted by Gigi Sayfan
on November 12, 2015
Data is king. Everyone knows that. Big Data has been on the buzzword most wanted list for a while now, along with machine learning, Hadoop and Spark.
Companies eagerly collect data and build massive pipelines to process that data in order to gain insight into what it all means. Guess what? Some companies are able to do that successfully for particular aspects of their operation. This typically happens when the company has:
- the ability to collect data on a massive scale and over a long timeframe
- data that is relatively stable
- a talented team of data scientists combined with domain expertise that can sift through the enormous piles of data and make informed decisions about what's relevant and what's not
- the know-how to configure and tune the huge big data/machine learning apparatus
- the knowledge to interpret the results and know how to tie the numbers to concrete steps or actions in the field.
This is extremely difficult.
A few companies transcend the hype and can actually demonstrate measurable improvements. Many others live in a world of illusion that their fate and strategic direction are optimized by hard data, when in practice they follow a false path. The problem is that there are no controls. No company can afford to split into two and compare its performance over time with two different approaches.
That said, big data, analysis and machine learning are amazing technologies that can really improve and scale a company. The trick is to do it right and that takes a lot of hard work. It gets easier and easier to hook up various data sources into big data pipelines, but the important parts are difficult to automate.
Posted by Sandeep Chanda
on November 2, 2015
Aurelia is written to work with simple conventions that help you save time by not having to write a ton of configuration code. In addition, it also doesn't require writing a lot of boilerplate code to get you up and running. It is written to the ES2016 specifications, and hence incorporates a bunch of modern conventions provided in the standard. It also has robust routing capabilities and supports dynamic routes, child routers and asynchronous screen activation patterns. Not only that, it also supports extending your HTML. You can create custom HTML elements and control the template generation process.
Setting up Aurelia is easy. First you need to set up Gulp for build automation. Using your Node console, run the
npm install –g gulp command to install Gulp. You also need to install the browser package manager jspm using npm. Now to setup Aurelia, you can either use Yeoman to generate a bare bones Aurelia project in your target folder using the command
yo generator-aurelia or alternatively, download the Aurelia skeleton navigation project and extract the contents to your target folder. Now navigate to the folder in your Node console and then run npm install command to configure the dependencies and the
jspm install –ycommand to install the Aurelia library and bootstrap.
You are all set! Aurelia demonstrates pretty amazing levels of browser compatibility and works well in all modern browsers. Not only that, you are not bound by ES7 specification. You can program pretty much in ES5, ES6, TypeScript, CoffeeScript, and the list goes on.
Posted by Gigi Sayfan
on October 29, 2015
Software is eating the world, as Marc Andreessen said. More and more functions that used to require people are controlled by software. The demand for software engineers is on the rise and the trend seems fairly stable. Many people speculate that at some point software will replace most software engineers as well. But we're not quite there yet.
There is no denying that code and software are becoming increasingly important in our society. The big question is whether or not everybody should learn to code. There are several aspects to the matter. One of the arguments for it is that since code is running so much of our world, everyone should at least study to have a sense of what software is and what it takes to create it and maintain it. I'm not sure I subscribe to this viewpoint. Most people are perfectly happy being ignorant of the inner working of the many crucial things on which they depend. How many people know what electricity really is or how an internal combustion engine works? More specifically, how many people use apps on their phone without the slightest idea of how they were built?
People that are interested have as many resources as you can dream of available for free to learn on their own via articles, YouTube videos, code academies, sample code on GitHub, etc.
Another argument for the idea that everyone should code is that it will expose people (especially less privileged people) to a very lucrative opportunity to earn a decent living. Again, I'm not sure how effective that would be. It takes a certain aptitude to be a professional developer who actually gets paid. You need to be the type of person who can sit in front of the screen for hours, focus, telling an idiot machine precisely what to do and not get frustrated when it gets it wrong.
If everyone learns to code, either a lot of people will fail or you dial down the content to total fluff. What about the people who have the aptitude and can be successful, but were never exposed to code? This is a different story. Schools should introduce code to everyone and let them experiment, but the purpose would be to pique the interest of kids that might succeed and not to churn out hoards of inept software developers.
Posted by Gigi Sayfan
on October 26, 2015
One of the most basic building blocks of programming is the function. You call some piece of code, passing some parameters and you get back a returned value. If you grew up with any modern programming language you probably don't even think about it. But, the function was a major innovation. In the guts of your computer there were no functions. There was just a program counter that went forward and you could tell it to jump somewhere else if you really want to (the dreaded goto).
I started programming BASIC at 11 years old. Note, this was not Visual Basic, which is a full-fledged modern programming language. This was unstructured BASIC. For me, a program was this linear set of instructions to the computer that accepted input, did some processing and then printed or drew something on the screen (a TV back then, no monitors). Each line in the code had a line number and a common control structure was "goto
<line number>". So, the program was literally a linear set of instructions executed one after the other.
I still remember my awe when I learned about the "GOSUB" command (go subroutine), which lets you execute a block of commands and then return to the line after the call. Amazing, isn't it? No parameters, of course, and no local variable, just go somewhere and return automatically. This was still a major improvement over the "goto" statement, because with "GOSUB" you could call the same subroutine from different places in your program and it will return to the right place. If you wanted to implement it with "goto" you had to store the return line number in a global variable and "goto" it at the end of the code block. There was also the "DEF FN" pair of keywords that confused me to no end. This is arguably the worst and most inconsistent programming language syntactic structure I ever encountered. It allows you define as expression similar to Python's Lambda functions. You have to concatenate your function name to the "FN", so each function has a mandatory "FN" prefix such as:
DEF FNSQUARE(x) = x * x
Now you can call the your function
10 PRINT FNSQUARE(5)
Nasty. Isn't it?
So, whenever you are frustrated with your programming language or why stack overflow has the best answer buried in third place, remember that it has never been so good and it will only get better.
Posted by Sandeep Chanda
on October 19, 2015
While Amazon Web Services is leaving no strings behind and continuously striving to complete the loop in being a provider of end to end data services, Microsoft Azure is not too far behind. In my previous post, I talked about AWS launching QuickSight for business intelligence. Azure has now come up with its own offering in the form of Azure Data Lake. Azure is currently collecting a lot of streaming data from different devices and letting users focus on the insights using their managed Hadoop service HDInsight.
Azure Data Lake brings two capabilities. First is the Data Lake Store, which is a hyper-scale HDFS repository designed for analytics on big data workloads. It pretty much does the job of high volume, high speed data processing at any level of scale, supporting near real time sensors and devices. The Data Lake Store has no limitations in terms of the type of data or its size. It supports processing of data in any format, size, and scale. It is HDFS compatible and hence provides support for Hadoop out of the box. It also supports Azure Active Directory, allowing you to configure data streams from within your enterprise network.
The second capability is the Azure Data Lake Analytics which allows you to run queries on any storage in Azure (blobs, Data Lake Store, SQL DB, etc.) and make sense out of the large volumes of data stored. It also comes with the U-SQL query offering designed on the lines of familiar SQL syntax. You can use the query language to declaratively create big data jobs that run against the stored datasets. U-SQL jobs can also be designed from your familiar Visual Studio environment. Data Lake Analytics jobs are not limited only to U-SQL, you can write your code as well to create them.
Posted by Sandeep Chanda
on October 12, 2015
A slew of recent product releases in the world of Amazon Web Service (AWS) indicate Amazon's hurry to complete its portfolio of offerings in the world of Big Data. They have pretty much covered their ground when it comes to collecting, storing, and processing large volumes of data. Platforms such as Amazon RDS, DynamoDB and Redshift were created for the purpose. What was really missing was a product that could derive insights, in real time, and not just for technology experts, but also for business users who could then make business decisions on what they are able to interpret by leveraging visually interactive dashboards.
QuickSight aims at completing the loop in making AWS the provider of low cost, full-fledged, scalable business intelligent services that deliver data insights from a wide range of data sources for a fraction of the cost of legacy BI solutions. According to Werner Vogels, CTO at Amazon.com, there is an inherent gap between the volumes of data that are generated, stored and processed by several enterprise scale applications and key decisions that business users make on a daily basis. QuickSight aims at bridging this gap.
QuickSight is a business intelligence service powered by cloud and solves the problem of speed, complexity and cost of generating insights from large volumes of data. It is also pretty easy to setup and use. QuickSight is powered behind the scenes by a superfast, parallel, in-memory calculation engine named SPICE. It is designed to provide responses in milliseconds for queries run on very large datasets. This allows QuickSight to scale pretty quickly to thousands of users on a range of data sources available in AWS. In addition to SPICE, QuickSight also has technologies for auto discovery of data changes and curating it for analysis and also making suggestions based on several parameters like the metadata of the data source, query history, etc.
Posted by Gigi Sayfan
on October 6, 2015
The value of college education in general is a hotly debated topic these days, especially in the US. I'll focus here on computer science education. There is little question that the average (and even most of the less than average) comp sci graduates will have no problem landing a job right out of school. The shortage of good software engineers is getting more urgent as more and more of our world is run by software. So, economically it may be a good decision to pursue a comp sci degree, although many vocational programming schools have popped up and apparently place most of their graduates.
But, how much value does the comp sci education provide to the aspiring software engineer once they're out of school and successfully land their first job? How much value does an organization derive from the academic training of a college educated software engineer?
In my opinion, not much at all. I have a computer science degree. My son just started a comp sci program. I interview a lot of software engineering candidates with and without degrees.
The curriculum hasn't changed much since my days (more than 25 years ago). The material is appropriate academically and computer science is a fascinating field, but what is being taught has very little bearing on the day-to-day tasks of a software engineer.
The real world is super nuanced. Take, for example, the very important issue of performance. The performance of a system has so many dimensions and can be improved in myriad ways: changing the requirements, improving perceived performance, providing approximate results, trade off space vs. speed vs. power, trade off flexibility vs. hard coding, selection of libraries, how much security and debugging support do you throw in, selection of file formats and communication protocols, hardware, caching and more. Then, there is of course algorithmic complexity, but even then, most of the time, it is about intelligently mixing together existing algorithms and data structures. In all my years in the industry, developing high performance production systems and working with other engineers who developed low-level code and inner-loop algorithms, I don't recall a single case where formal analysis was used. It was always about empirical profiling, identifying hotspots and making appropriate changes.
Note, that pure computer science is very important for advancing the field and it is applied by a very small number of people who do basic research and core technology. It's just not especially relevant for the day-to-day work of the vast majority of software developers.
Posted by Sandeep Chanda
on September 29, 2015
The SQL Elastic Database Pool allows you to run SQL databases with a private pool of resources dedicated for that purpose. Azure SQL database capabilities have been significantly enhanced recently to provide support for high-end scalability, allowing management of fairly large scale databases with huge amounts of compute and storage.
While cloud services in Azure were built to scale from the get-go, there were limitations around scaling the SQL database, especially if you were building a multi-tenant application. Not anymore. With the elastic database pool you can isolate the database needs for each customer and charge them based on consumption of actual resources.
It is very typical of SaaS-based applications to use a separate database for each tenant. Without the elastic pool, you always ended up allocating more resources from the start, not knowing how the actual consumer consumption would be. Or if you started with low allocation of resources, you always risked performance. With SQL elastic database pool, you don't have this problem anymore. You can create a private pool of resources (compute, I/O etc.), and then run multiple isolated databases. You can also set the SLAs for each database for peak and low usage depending on the predicted customer usage. You can also leverage the management APIs to script the configuration of the databases. In addition, you can also run queries that span multiple databases (pretty cool!).
The elastic database pool has three pricing tiers: Standard, Basic and Premium Elastic. These offer a pretty wide range of pricing and resource choices to setup your database pool. You can also very easily migrate between the pricing tiers, allowing you the flexibility to gradually move to a higher pricing tier as the usage grows.
Posted by Gigi Sayfan
on September 25, 2015
Enter Mozilla - Mozilla has always created innovative stuff. Firefox was built in C++ on a technology called XPCOM (Cross Platform Component Object Model). This took Microsoft's very successful COM technology and created a cross-platform version from scratch. A couple of independent cool products were even developed on top of it (Python IDE Komodo from ActiveState). But, it was a very complicated piece of software.
Fast forward to today and Mozilla is building a new prototype browser using a new language of its own making, known as Rust. Rust is very unique. It brings memory management to the front and in the process also takes care of concurrent programming. It's able to detect a slew of issues at compile time that traditionally were discovered only at runtime. It is said - tongue in cheek - that if your program compiles it is correct. The problem is that getting a Rust program to compile is a non-trivial adventure. I played with Rust a little bit and it requires a lot of persistence. Right now, version 1.3 is out. There is an enthusiastic community around Rust and a lot of things are done right. They put a lot of emphasis on documentation. There is support for projects and packaging is not an afterthought (Python's Achilles' heel). Rust has great integration with C and other languages, so you can leverage many existing libraries.
I believe Rust is going to be a major player where critical, secure and performance-sensitive code is required. Give it a try, but don't count on it for serious production code just yet.