In Part 1 of this article
, we focused on the underlying infrastructure that makes one of the deployments in question: Grid, cluster or Cloud environments. Considering that your performance and the outcome is only as good as your infrastructure, we will take what we spoke about and expand it to cover these environments from a developer perspective.
The first question that you may ask at this point is: Why does a developer need to care about the underlying environment and setup? After all, developers need to code and care about nothing but coding, right?
No! In parallel and distributed environments, developers must know how the infrastructure is set up and configured to take advantage of what's actually available. This is less so with Clouds, and we will cover that shortly. But all and all, as a developer you must understand what you are coding for to know whether or not your application will even benefit from such environment. In my article for Developer.com
, "A Grid for every Application
," I spoke about the fact that every application has its own set of requirements and it is best to find/build an environment that best fits your application. Some aspects of your set up, amount of memory a node has for example, may be very important to one application and not so much in another.
Table 1 outlines our discussion.
There are oddly enough similarities between an HPC Cluster which is considered to be very expensive in cost and our Cloud infrastructure. There is one very important difference between the two however: SLA Requirements! This makes sense if you look at Part 1 of this article. HPC Clusters have very fast networks vs cloud where public internet is used for communication. For this reason, you see the type of work that gets farmed out to a Cloud be very different than the type of work that is scheduled on an HPC Cluster.
Typically speaking, Cloud workloads are longer running in length and resemble a workflow like loan processing, large data crunching routines, data mining, etc. HPC Cluster workloads are more repetitive and smaller in duration -- problems that are known as Massively Parallelizable Problems, where a smaller chunk of the work is scheduled on a node. HPC clusters benefit from the fact that there are many nodes, a fast scheduler that can utilize these nodes and high speed connectivity for communication. Cloud infrastructures are typically managed by FIFO queues and not much enforcement for policies takes place. In addition, Cloud environments are VM based and as such less powerful that bare metal machines.
Grids are somewhere in the middle of the pact. There is good connectivity between the nodes, but the nodes are more heterogeneous in nature: fast machines, VM's, old 486's, etc. For this reason, the jobs that get farmed out to the Grid are more diverse in nature. The infrastructure could get very large -- into 10,000s, and dispersed globally. There are pockets of Grids (known as Virtual Organizations), and generally speaking a committee that manages the environment.
As far as the types of jobs that get farmed out to the Grid, there is no rhyme or reason or pattern to them. The idea is that there is at least one (!) resource that is suitable to your needs. This is a very high-level view of things, but the reality is not much different:
- Pockets of resources in different data centers
- The resources are of different types
- The scheduler works hard to try to figure out what is the suitable resource for your request
What does that mean to you? If you have a large enough Grid that is capable of handling a range of requests, it can be your savior. This is due to the fact that application integration is very challenging and time consuming.
In my opinion, Cloud application integration is the simplest. That is due to the fact that I would put my long-running jobs on the cloud and spend as little time as possible to try to get these applications integrated and fine-tuned with the cloud API's. Cloud, if you recall, is connectivity over the public internet with little SLA requirement. As such, saving a couple of seconds will not matter to me.
Grid is where I aim to be next. Application integration is tough and will require, at times, re-engineering of a legacy application to benefit from your Grid framework and infrastructure. Many choose to stop here. The application goes into maintenance mode after this phase, and some extra features get implemented over the years.
Some are brave enough to have a tighter integration with the underlying hardware in order to get the best bang for the buck. This type of integration is very fine-tuned and you are essentially re-engineering your application to benefit from amount of memory to details of the physical later protocol of your network for better performance.
There is no chicken and egg
The steps that I outlined in the previous section get repeated over and over again. You start with a wide-brush-stroke approach to the integration as new features are added and keep working your way down as you fine-tune your infrastructure.
I guess that is all that I want you to take away from these two articles: continuous integration of your application to take into account the capabilities of your underlying infrastructure. If your application can be parallelized, then you want to move more towards an HPC Cluster with tight integration of the hardware. If your application is more workflow type that cannot be parallelized, then you will move more towards a Cloud infrastructure where low cost VM's can be used to process your request.
Again, there are so many exceptions out there that make these statements look irrelevant, but I hope that you gained some perspective of how to at least approach the problem.
Keep on reading ...