In part I of this article, (Resource Sharing, Part I: An Architect's Perspective
) we focused on how the architect is affected by resource sharing and how resource sharing could actually be embraced and thus be thought of an advantage point from the user’s perspective. In this article, we focus on the resource provider and how resource sharing affects it.
Resource sharing, from the infrastructure provider perspective is not that interesting! One application is no different than the next, and all the infrastructure provider cares about is that its infrastructure is up and running. This is different than what we spoke about before, in that two users accessing the same resource are in-conflict with each other: each user believes that its job is the most important job and thus wants all the resources.
What ends up happening in most cases is that the infrastructure provider or supplier creates silos to meet the demands of each user. Silos mean no sharing and that puts the users at ease, but it also creates a bloated environment where over-provisioning takes holds and costs rise. Oddly enough, the infrastructure provider is beginning to care about more than just infrastructure uptime.
Keep in mind that you as the supplier are incapable of shifting usage so that your resources are better utilized. You provide the resource, and the users are “supposed” to make the best use out of that resource. This causes yet another conflict of interest between the two parties: the user wants dedicated resources and the supplier wants to utilize the available resources to best way possible.
Conflict of Interest
We elaborated on the fact that if two users are to share a given environment, they are at conflicts with each other in that each believes his/her work to be the higher priority. When we add the supplier to the equation, we create yet another source of contention in that the supplier does not want to dedicate and the user wishes otherwise. I like to call this the Conflict Triangle (fig. 1), where each party involved is looking out only for his/her interests and not the overall good of the system.
Figure 1: The Conflict Triangle.
The resource manager put forth to manage access to the backend resource enforces a fair-share access, but it cannot provide shared access to resources if the clients are unwilling to share. This problem seems small but when you are about 10,000’s of nodes, it becomes a management nightmare not to mention too costly.
There are number of ways to get around the psychological limitations of sharing, and one of the most prominent ways is to create separate sandboxes for users to play in. You can create a sandbox using a number of different ways, but my favorites are:
- Virtual machines
- Manual setup and cleanup
The Virtual machine (VM) is the obvious way in that you have a number of VM instances running on a physical node. Each VM is discrete and essentially not shared across users, but the underlying resource is shared. This method is used widely by cloud vendors.
Figure 2: Virtual Machine Used to Logically Separate.
If you are providing a public infrastructure, then the VM separation makes the perfect sense. Each user can have its own VM and is configured as desired. There is less conflict between the users as a VM is given its own amount of resource and that is separate from the other VM’s running on the box simultaneously.
VM’s, however, have too much overhead in that you are literally running multiple copies of the OS on the same physical machine.
Private Clouds or privately own infrastructures can take advantage of the fact that there are fewer and known users that access the infrastructure in order to optimize access. As opposed to logically separating each environment using VM’s, you can auto-manually configure the environment before and after use as outlined below:
- Run startup scripts to setup the environment
- User is given access to the environment
- Run cleanup scripts to get ready for the next user
Figure 3: Auto-manual configuration.
This process assumes a number of conditions to be true:
- The operating system does a good job at local resource management
- The users are not allowed to mock around with job/thread priorities
- Proper Access Control List is in place to not allow improper access to data
In most cases, all the above conditions are met and hold true. In fact, a number of legacy and open-source job schedulers such as PVM and MPI hold the above conditions to be true by default.
This approach is lighter weight, and can easily be automated so that a script runs after every job to clean up the environment. Security concerns could bloom, however, as clean up might not be done properly. This is manageable though, and the risk is limited since we are talking about a single environment where the users and user access is known and controlled.
I hope that by now you have realized the difference and more importantly, the disconnect that exists between the application programmers/architects and the infrastructure providers. This disconnect has more severe side-effects in Grid/Cloud environments than other types of applications. There are lots of communication amongst the users and constant chatter to the back-end environment.
The decision on how an infrastructure is setup and configured must be decided collectively between the two parties. A change in the application or the backend could easily throw out the performance improvement out the window and with it your ROI.