By Matt Gillespie
Intel® Software Network
Orig. Published On: Tuesday, October 30, 2007
Last Modified On: Tuesday, October 30, 2007
Original URL:
http://softwarecommunity.intel.com/articles/eng/1729.htm
Introduction
Grid computing based on Web services provides a compelling means for enterprises to create High-Performance Computing infrastructures. Adding to the inherent cost/performance advantages of grid networks based on Intel® Xeon® processors and/or Itanium® processors, building solutions from pre-existing Web services adds efficiency and simplifies the addition of standards-based core functionality to those systems. For those organizations that want expert external guidance in the development of such systems, Intel® Solution Services, Intel Corporation’s worldwide professional services organization, offers assistance that can range from minor decision support to full consulting and design services.
This article connects the technical community with the decision-making community on topics related to grid computing with Web services on Intel® architecture. It helps network designers, solution architects, and policy makers at corporations, research institutions, and other grid-computing facilities learn about using Web services to enable High-Performance Computing on Intel architecture.
Grids as Interoperability Solutions for High-Performance Computing
The success of well-known, non-commercial grid-computing initiatives such as SETI@Home* presaged the widespread use of grid networks in commercial enterprises as a cost-effective and scalable means of solving large computing problems. Reports of shrinking the required compute time for a given type of computing operation from a matter of years (on a single monolithic computer) to days on a distributed grid are not uncommon. By implementing otherwise-unused computing resources, companies now widely use grid computing in implementations such as industrial research and complex design simulations.
Grid computing provides for the highest degree of flexibility in the context of the following hierarchy of execution vehicles:
- Individual compute nodes, such as stand-alone servers based on Intel® Xeon® processors and Itanium® processors.
- Server clusters, which are traditionally deployed within homogenous organizations or networks. While this architecture provides robust compute power, access and deployment of applications is very manual.
- Computing grids provide a middleware-based environment that allows automation and interoperability, so that applications can be deployed efficiently and quickly to a number of sites.
Web services functions as an integration technology (middleware) that significantly enhances the ability of enterprises to put grid computing to work from existing software building blocks. Beyond its ability to create very large high-performance computing architectures from an aggregation of distributed machines, grid computing is now coming into its own as a means for enterprises to manage their distributed resources. One means toward this end is the development of grid services, a class of Web services specifically defined to meet the needs of grid computing.
A proposed recommendation for the Open Grid Services Infrastructure (OGSI)* (PDF 500KB) is before the Global Grid Forum, of which Intel is a sponsor, to provide international standards for grid services. Those standards can be expected to create substantial efficiencies over the proprietary grid systems that many enterprises are currently developing to make use of Web services in grid computing.
The Evolution of Web Services as an Enabling Grid Middleware Technology
By providing functional building blocks from which developers can create grid-computing systems, grid services enable the efficient construction of secure, standards-based grid infrastructures. By abstracting basic functionality away from the primary development effort, grid services allow developers to focus on specific business needs.
The Globus Project builds on OGSI with the Open Grid Services Architecture (OGSA), which can be implemented with the Globus Toolkit*, an open-source SDK used for building grids with grid services. This set of Java* classes and associated tools provides the basic functionality called for in the OGSI specification, as well as the means to convert existing Web services and applications to grid services. This functional framework addresses requirements such as the following, each of which is described in more detail below:
- Security infrastructure.
- Lifetime management.
- State management.
Security infrastructure. The toolkit uses encryption based on Public Key Infrastructure (PKI)* to manage a single sign-on environment, which allows users to authenticate just once to gain access to all of the grid services involved in a solution. This functionality is essential to manage complex systems of services, since users would otherwise be compelled not only to manage a potentially large number of separate credentials, but also to enter those credentials repeatedly.
OGSA manages authentication credentials by means of certificates, which are assigned to both users and grid services. By granting an application access to their certificate, a user enables that application to authenticate to any number of grid services transparently on behalf of the user. Further, the application itself may authorize grid services to use that certificate to access third-party services where needed. This mechanism also supports the use of group permissions, allowing administrators to manage authentication rights for collective groups of end users and services.
Lifetime management. In order to manage the one-to-many relationship between applications and the grid services on which they are based, the application has the ability to invoke instances of specific services as they are required. Those services, in turn, can invoke additional services as needed. Once a service is no longer needed, it must be stopped, releasing for other purposes the memory and processor resources that that service had used.
As part of that responsibility, the lifetime-management mechanisms must accommodate failures in individual nodes by ensuring that a frozen system does not continue to take up resources on other machines with unfinished processes. OGSA addresses this requirement by defining a set lifetime to each service instance, after which the instance is automatically terminated. These lifetimes can be extended, if needed, during execution. By standardizing and centralizing the mechanisms responsible for invoking and discontinuing individual services, OGSA minimizes the incidence of unneeded services running in the background.
State management. All grid services are stateful; that is, they have the ability to maintain dependencies between transactions or requests. This characteristic is in contrast to most Web services mechanisms, which are stateless, meaning that a new connection is opened for every request and then immediately closed after the request is completed. Stateless systems thus treat each request as being independent of all others. While statelessness simplifies system design by doing away with the need to store persistent state details, it limits system functionality and requires, for instance, the reprocessing of authentication details with each request.
The stateful nature of grid services allows a single session to persist throughout the service's lifetime, which allows, for instance, separately hosted grid services to work together in the absence of a persistent connection. Once a service has independently completed a task asynchronously, it has the capability to reconnect to its consumer (an application or other grid service) and resume the session. This statefulness also assists in simultaneous support for multiple client connections, as well as recovering from node-specific failures within the grid.