Posted by Sandeep Chanda
on December 1, 2015
The Google Cloud Datalab is Google's answer to big data exploration, analysis and visualization.
All cloud service providers are competing intensely to close the loop of offerings in the realm of big data services that includes exploration, transformation, analysis, and visualization services. In an earlier post, I mentioned the announcement of QuickSight, the data visualization platform from the Amazon Web Services team that now makes the platform an end-to-end provider of cloud-based real-time data stream analytics and visualization services, competing neck and neck with Microsoft Azure's Analytics Platform System. Guess what, Google is not far behind either!
Last week it announced the availability of Cloud Datalab, Google Cloud's platform for data exploration, analysis and visualization. The timing of this beta release couldn't have been sweeter as it comes closely on the heels of Google announcement of the open sourcing of its machine learning platform TensorFlow.
Datalab is extremely interactive and provides for an interactive notebook environment where you can store your analysis as notebooks and then publish and share your insights with the world. As a developer you can go further to develop, test, and deploy data processing pipelines. Support for BigQuery is most obvious. You can seamlessly write code in Python, SQL, and BigQuery UDF constructs to build and test your pipeline. Underlying it leverages Jupyter, a powerful web-based notebook platform for sharing documents containing live code. This becomes particularly useful for storing machine learning and statistical models. You can even reuse existing Jupyter notebooks.
To start using Datalab, you need to first register your application for Google Compute Engine.
If you don't have an account, you can sign-up for a free trial with $300 credit. If there are no existing projects in your developer console, a default "My Project" is created.
Once your billing is activated, you can start deploying your Cloud Datalab as an App Engine application.
As far as pricing is concerned, you only pay for the use of underlying cloud resources by the App Engine such as BigQuery and Storage. Competition is sure to get hotter with this release, and only time will pronounce the winner!
Posted by Gigi Sayfan
on November 30, 2015
The year was 1999, the internet was blooming, Simple Object Access Protocol (SOAP) burst onto the scene, based on the latest and greatest dev: XML. SOAP had its 15 minutes of fame, but was very verbose and prescriptive and often required tooling to use it efficiently. A whole family of web service specifications (WS-*) grew on top of SOAP and it was used successfully for interoperability.
The relatively heavy weight of the SOAP protocol didn't sit well with the nimble web developers of the time. Representational State Transfer (REST) was just the thing - simple to describe, very flexible and almost directly mapped to HTTP. No need for special libraries to consume.
Fast forward to 2015 and you see a plethora of REST best practices, how to validate your content (JSON schema), how to authenticate, how to report errors, how to implement stateful interactions, etc. Everyone and their cousin has their own flavor. What do you put in the HTTP headers, what do you send as URL parameters and what is in the request body? HATEOAS or not HATEOAS?
These days, every Enterprise/best practice REST API is a proprietary and non-standard half implementation of SOAP. The SOAPjr project attempts to combine the simple parts of SOAP with JSON-RPC to create a clean and yet structured web API protocol. It will be interesting to see how future web APIs are going to be implemented.
Posted by Gigi Sayfan
on November 17, 2015
Agile development, when executed well, is a thing of beauty. A sprint starts, stories/tasks are assigned. People start working in a test-driven manner and the system comes to life piece after piece. With each sprint new functionality is delivered. Refactoring is ongoing. The system architecture is always close to the conceptual ideal. Stakeholders are happy. Developers are happy. Everybody is happy.
And then, there is the real world--pressure, mid sprint changes, unclear requirements, tests without much coverage, technical debt mounting, production systems crashing, etc. With agile development that's not supposed to happen, but agile practices require dedication and discipline from all stakeholders--including top management, which is not always there. Even at development team level, some people are not fully on board or are simply not organized enough.
There are some inglorious tasks such as heavy documentation, verifying compliance with obscure standards and checking backward-compatibility with Netscape Navigator 4 on Windows 95, for example. If you reach a point where you notice Agile fatigue where basic practices are not followed, you can try gamification. Gamification is all about creating a work environment where you get rewarded for performing your duties by incorporating game design elements.
Whenever you squash a bug you get some experience points, the system keeps track of your not being late for meetings streak, the whole team gets a gold star when all of the sprint is completed successfully. This may sound silly if you never tried it, but movements such as the quantified self demonstrate that people enjoy it and will adapt their behavior to accomplish little game goals that just happen to correspond to real life goals. It can add fun, as well as help keep track on actual important metrics.
Posted by Sandeep Chanda
on November 13, 2015
Just a few days back, Google, in a not so surprising move, announced open sourcing its machine learning platform called TensorFlow. TensorFlow employs the concept of neural computation using Data Flow Graphs. Data flow graphs are represented as nodes and edges where nodes represent a mathematical operation or an endpoint for data ingestion, sending output, or reading and writing persistent variables. Edges, on the other hand, describe the input output relationship between the nodes. Typically a node represents a computational system and is executed asynchronously in parallel with others.
The primary reason Google quotes for making this an open source platform is to inspire more research, not only in the world of unique computational domains, but also research related to deep neural networks. One of the most important architectural aspects of TensorFlow is that it can be run on a multitude of devices such as desktop, server, mobile, you name it — leveraging computing power from all these different devices and in turn allowing it to demonstrate its capabilities as a truly portable machine learning platform.
Asynchronous computing, threads, and queues are first class citizens in TensorFlow allowing it to maximize performance from the available hardware running the compute elements of the data flow graph. It also supports auto-differentiation out of the box. It can automatically compute derivatives based on the available data combined with the objective function and predictive model definition. It is also extremely flexible in terms of supporting custom libraries. If you can express a computation in terms of a data flow graph, you can leverage TensorFlow's capabilities. As simple as that!
Posted by Gigi Sayfan
on November 12, 2015
Data is king. Everyone knows that. Big Data has been on the buzzword most wanted list for a while now, along with machine learning, Hadoop and Spark.
Companies eagerly collect data and build massive pipelines to process that data in order to gain insight into what it all means. Guess what? Some companies are able to do that successfully for particular aspects of their operation. This typically happens when the company has:
- the ability to collect data on a massive scale and over a long timeframe
- data that is relatively stable
- a talented team of data scientists combined with domain expertise that can sift through the enormous piles of data and make informed decisions about what's relevant and what's not
- the know-how to configure and tune the huge big data/machine learning apparatus
- the knowledge to interpret the results and know how to tie the numbers to concrete steps or actions in the field.
This is extremely difficult.
A few companies transcend the hype and can actually demonstrate measurable improvements. Many others live in a world of illusion that their fate and strategic direction are optimized by hard data, when in practice they follow a false path. The problem is that there are no controls. No company can afford to split into two and compare its performance over time with two different approaches.
That said, big data, analysis and machine learning are amazing technologies that can really improve and scale a company. The trick is to do it right and that takes a lot of hard work. It gets easier and easier to hook up various data sources into big data pipelines, but the important parts are difficult to automate.
Posted by Gigi Sayfan
on November 3, 2015
Test Driven Development (TDD) is arguably the most impactful Agile practice. Nobody even talks about it anymore, but automated tests were revolutionary when they came on the scene. Fifteen years ago, the common practice was to have a big monolithic application. The developers would produce a build (usually an executable) once in a blue moon, throw it over the fence to the QA department which would bang on it for a while and then open tens if not hundreds of bugs. Then, the developers would engage is a bug squashing period.
Luckily, that's no longer the case and everyone recognizes the importance of automated tests and rapid iterations. TDD means that your whole development process is driven by tests. When you think about problems you phrase the discussion in terms of what tests you'll need. When you model things you consider how testable they are. You might change choose between alternatives based on the impact on your test suite.
The ultimate in TDD is the test first movement. This is truly putting tests on a pedestal and treating tests as the most crucial building blocks of your system and its architecture. There are multiple advantages to TDD. The best one is that you get to say TDD all the time. Try it. It's fun: TDD, TDD, TDD. Almost as important is the fact that you will have automated tests for all your code and those tests will even be kept up to date as you evolve and refactor your system. Don't underestimate TDD. Every development team starts with the best intentions regarding all kinds of best practices, but as the pressure mounts many of them go down the drain: test coverage, modularity, coding conventions, refactoring, clean dependencies, etc.
Having tests as your primary driver ensures they will not be neglected, which will give you better a chance at evolving your systems.
Posted by Sandeep Chanda
on November 2, 2015
Aurelia is written to work with simple conventions that help you save time by not having to write a ton of configuration code. In addition, it also doesn't require writing a lot of boilerplate code to get you up and running. It is written to the ES2016 specifications, and hence incorporates a bunch of modern conventions provided in the standard. It also has robust routing capabilities and supports dynamic routes, child routers and asynchronous screen activation patterns. Not only that, it also supports extending your HTML. You can create custom HTML elements and control the template generation process.
Setting up Aurelia is easy. First you need to set up Gulp for build automation. Using your Node console, run the
npm install –g gulp command to install Gulp. You also need to install the browser package manager jspm using npm. Now to setup Aurelia, you can either use Yeoman to generate a bare bones Aurelia project in your target folder using the command
yo generator-aurelia or alternatively, download the Aurelia skeleton navigation project and extract the contents to your target folder. Now navigate to the folder in your Node console and then run npm install command to configure the dependencies and the
jspm install –ycommand to install the Aurelia library and bootstrap.
You are all set! Aurelia demonstrates pretty amazing levels of browser compatibility and works well in all modern browsers. Not only that, you are not bound by ES7 specification. You can program pretty much in ES5, ES6, TypeScript, CoffeeScript, and the list goes on.
Posted by Gigi Sayfan
on October 29, 2015
Software is eating the world, as Marc Andreessen said. More and more functions that used to require people are controlled by software. The demand for software engineers is on the rise and the trend seems fairly stable. Many people speculate that at some point software will replace most software engineers as well. But we're not quite there yet.
There is no denying that code and software are becoming increasingly important in our society. The big question is whether or not everybody should learn to code. There are several aspects to the matter. One of the arguments for it is that since code is running so much of our world, everyone should at least study to have a sense of what software is and what it takes to create it and maintain it. I'm not sure I subscribe to this viewpoint. Most people are perfectly happy being ignorant of the inner working of the many crucial things on which they depend. How many people know what electricity really is or how an internal combustion engine works? More specifically, how many people use apps on their phone without the slightest idea of how they were built?
People that are interested have as many resources as you can dream of available for free to learn on their own via articles, YouTube videos, code academies, sample code on GitHub, etc.
Another argument for the idea that everyone should code is that it will expose people (especially less privileged people) to a very lucrative opportunity to earn a decent living. Again, I'm not sure how effective that would be. It takes a certain aptitude to be a professional developer who actually gets paid. You need to be the type of person who can sit in front of the screen for hours, focus, telling an idiot machine precisely what to do and not get frustrated when it gets it wrong.
If everyone learns to code, either a lot of people will fail or you dial down the content to total fluff. What about the people who have the aptitude and can be successful, but were never exposed to code? This is a different story. Schools should introduce code to everyone and let them experiment, but the purpose would be to pique the interest of kids that might succeed and not to churn out hoards of inept software developers.
Posted by Gigi Sayfan
on October 26, 2015
One of the most basic building blocks of programming is the function. You call some piece of code, passing some parameters and you get back a returned value. If you grew up with any modern programming language you probably don't even think about it. But, the function was a major innovation. In the guts of your computer there were no functions. There was just a program counter that went forward and you could tell it to jump somewhere else if you really want to (the dreaded goto).
I started programming BASIC at 11 years old. Note, this was not Visual Basic, which is a full-fledged modern programming language. This was unstructured BASIC. For me, a program was this linear set of instructions to the computer that accepted input, did some processing and then printed or drew something on the screen (a TV back then, no monitors). Each line in the code had a line number and a common control structure was "goto
<line number>". So, the program was literally a linear set of instructions executed one after the other.
I still remember my awe when I learned about the "GOSUB" command (go subroutine), which lets you execute a block of commands and then return to the line after the call. Amazing, isn't it? No parameters, of course, and no local variable, just go somewhere and return automatically. This was still a major improvement over the "goto" statement, because with "GOSUB" you could call the same subroutine from different places in your program and it will return to the right place. If you wanted to implement it with "goto" you had to store the return line number in a global variable and "goto" it at the end of the code block. There was also the "DEF FN" pair of keywords that confused me to no end. This is arguably the worst and most inconsistent programming language syntactic structure I ever encountered. It allows you define as expression similar to Python's Lambda functions. You have to concatenate your function name to the "FN", so each function has a mandatory "FN" prefix such as:
DEF FNSQUARE(x) = x * x
Now you can call the your function
10 PRINT FNSQUARE(5)
Nasty. Isn't it?
So, whenever you are frustrated with your programming language or why stack overflow has the best answer buried in third place, remember that it has never been so good and it will only get better.
Posted by Gigi Sayfan
on October 20, 2015
Agile practices work best with small cohesive and co-located teams. How do you scale it for a larger organizational with multiple teams, possibly in different locations and even time zones? One approach would be to treat all these teams as one big agile team and try to work around the issues of remoteness and time zones. This approach is doomed for multiple reasons.
A better approach would be to break the development into vertical silos in which each team is completely responsible for a particular set of applications or services. This approach scales well as long as you can neatly break development into independent pieces that can be fully owned by one team. One downside of this approach is that knowledge sharing and opportunities for reuse are much more difficult.
A third approach would be to treat teams as users of each other. This allows collaboration and combining multiple teams to tackle big projects without losing the benefits of the original Agile team. This requires careful management to ensure teams' schedules are properly coordinated and dependencies don't halt development. This is nothing new, but in an agile context the planning game is typically done at the team level. When multiple teams work together (even loosely) there needs to be another layer of management that takes care of the cross-team planning.