In Part 1 of this article, we focused on the underlying infrastructure that makes one of the deployments in question: Grid, cluster or Cloud environments. Considering that your performance and the outcome is only as good as your infrastructure, we will take what we spoke about and expand it to cover these environments from a developer perspective.
But why?
The first question that you may ask at this point is: Why does a developer need to care about the underlying environment and setup? After all, developers need to code and care about nothing but coding, right?[login]No! In parallel and distributed environments, developers must know how the infrastructure is set up and configured to take advantage of what’s actually available. This is less so with Clouds, and we will cover that shortly. But all and all, as a developer you must understand what you are coding for to know whether or not your application will even benefit from such environment. In my article for Developer.com, “A Grid for every Application,” I spoke about the fact that every application has its own set of requirements and it is best to find/build an environment that best fits your application. Some aspects of your set up, amount of memory a node has for example, may be very important to one application and not so much in another.Comparison
Table 1 outlines our discussion.There are oddly enough similarities between an HPC Cluster which is considered to be very expensive in cost and our Cloud infrastructure. There is one very important difference between the two however: SLA Requirements! This makes sense if you look at Part 1 of this article. HPC Clusters have very fast networks vs cloud where public internet is used for communication. For this reason, you see the type of work that gets farmed out to a Cloud be very different than the type of work that is scheduled on an HPC Cluster. Typically speaking, Cloud workloads are longer running in length and resemble a workflow like loan processing, large data crunching routines, data mining, etc. HPC Cluster workloads are more repetitive and smaller in duration — problems that are known as Massively Parallelizable Problems, where a smaller chunk of the work is scheduled on a node. HPC clusters benefit from the fact that there are many nodes, a fast scheduler that can utilize these nodes and high speed connectivity for communication. Cloud infrastructures are typically managed by FIFO queues and not much enforcement for policies takes place. In addition, Cloud environments are VM based and as such less powerful that bare metal machines. Grids are somewhere in the middle of the pact. There is good connectivity between the nodes, but the nodes are more heterogeneous in nature: fast machines, VM’s, old 486’s, etc. For this reason, the jobs that get farmed out to the Grid are more diverse in nature. The infrastructure could get very large — into 10,000s, and dispersed globally. There are pockets of Grids (known as Virtual Organizations), and generally speaking a committee that manages the environment. As far as the types of jobs that get farmed out to the Grid, there is no rhyme or reason or pattern to them. The idea is that there is at least one (!) resource that is suitable to your needs. This is a very high-level view of things, but the reality is not much different:
- Pockets of resources in different data centers
- The resources are of different types
- The scheduler works hard to try to figure out what is the suitable resource for your request
What does that mean to you? If you have a large enough Grid that is capable of handling a range of requests, it can be your savior. This is due to the fact that application integration is very challenging and time consuming.