Grid, HPC Cluster and Cloud: Part I, an Infrastructure Perspective

Grid, HPC Cluster and Cloud: Part I, an Infrastructure Perspective

(Editor’s note: This is the first article in a two-part series that compares the difference in infrastructure for Grid, HPC and Cloud. In the second part, we will look at the developer perspective of these three classes of compute infrastructures.)

Infrastructure is important: if for nothing else, for the fact that it sets a physical limit based on which our performance is measures. We cannot achieve better performance than our infrastructure is capable of delivering. Sounds rather trivial, but it is probably the most important point that I like to hit on and talk about here.[login]

The Chicken and the Egg

The question is whether it is the infrastructure that classifies the type or bias versa. Which one determines the other? If we are looking for an HPC environment, do we need to start with a set of rules? Or, we have made a decision based on our requirements and what we ended up with is considered an HPC environment. This is the question that many have asked, and that is why there is no clear set of guidelines that one can refer to in order to build such infrastructures. We will take the middle ground on this throughout this article. I will outline the common attributed that you may find in/for these infrastructures. I am of the philosophy that requirements dictate the final deployment, but there are a few things to consider:

    * Hardware costs
    * Infrastructure costs
    * Software integration costs
    * Risks: vendor lock-in, security, complexity of infrastructure

We will get back to these considerations and relate them to a set of attributes related of each deployment type.


Table 1 outlines our discussion.If you notice, I have stayed away from anything that is application-related directly as we are focusing on the infrastructure in this article. Ok, let’s get started on some of these attributes.

First: Grid

The one column that jumps out is the HPC column. Why? It’s expensive, most likely made out of proprietary hardware and software, small in size and most likely all coupled with the application, backend, etc. Why do we want such infrastructure? We need and must have this type of infrastructure in order to meet our SLA requirements. HPC or Cluster Computing (I use them interchangeably despite the bad comments that I will receive after writing this article) is “small” in size — a few hundred nodes and as high as a couple of thousand nodes. Each node is highly optimized to perform at its peak, and the application is configured such that it takes advantage of the node. For example, if you have four cores per node and 4GB RAM per Core, the chances are that your application was designed (or modified) to take this bit of information into account. We do not like to implement any security with this type of infrastructure. We like security to be only at the edges, and the nodes to do what they are good at: computation and computation only.

Second: Cluster/HPC

Cluster environments are well utilized unlike Grid environments. Grids, however, benefit from the fact that they are heterogeneous in nature. You may have nodes that are a few years old, and you may have nodes that are brand new. It is up to the resource manager, which we will discuss at a later article, to take advantage and find the best fit for a given request. The main point of Grid is to create a single system image (SSI) with the nodes that it has access to. From a client’s perspective, they are accessing a single node; a single image which is all powerful. This makes the resource manager’s job very challenging, and one of the main reasons that Grid nodes go underutilized for much of their life span.

Third: Cloud

This underutilization forces us to consider other possibilities such as VM’s and shared infrastructures. If we are unable to fully utilize a node, why can’t we only pay for the time that we use? This is the premises of Cloud: pay-per-use. In my previous article, ( I talked about the real cost of Cloud, and that should be taken into consideration.Cloud environments are usually very large and made out of bunch of VM’s that are assigned to a given user on demand, and removed when the user is done using them. The main issue with Cloud is the inability to guarantee any sort of SLA as the public internet is used to access the cloud environment. With the advances in networking, that issue may soon be moot, but it does exist today and you must be aware of it.

What now?

If you are as confused now as you were when you started reading this article, that’s good! There is no silver bullet; you may have or want a mix and match attributes to build your environment and that’s OK. I gave an outline of a typical scenario. This industry is changing fast, and what I typical today is atypical tomorrow. Here are some guidelines:

    * Network is king: take care of your network as your infrastructure will be as fast as your network.
    * Build multiple environments: mix and match: You may need a Cluster for one portion of your job and the rest can be shipped off to a cloud somewhere.
    * Take of security issues early on: don’t be ready only to find out that the security team will not approve your cloud strategy.

Read my next article in which I will give you a developer perspective to these environments.

Keep on reading ?


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist