o create scalable enterprise web applications, you need to consider both client- and server-side components. A solid code base, judicial cache implementations, and content acceleration via compression helps to create an optimum foundation for a high-performance application. This article focuses on the server side, and introduces the concept of load balancing from a performance and scalability point of view. You’ll see a high-level review of some common schemes for load balancing, and see how it can help scale the application to maintain a high level of performance for end users. This article doesn’t cover the process of setting up load balancing (that’s a subject in itself), but there is a wide variety of specialized products and books available that cover the topic.
Consider Yahoo as an example. Yahoo’s portal exposes web applications accessed by millions of users throughout the world. These are dynamic web applications that perform database transactions and render content in real time—more than just images or static HTML content. The end users often visit the same web applications on Yahoo repeatedly, and they expect the same (or better) performance each time. If they don’t get that level of performance, Yahoo risks losing its user base to its competitors. Each user click causes a certain amount of load that goes to the servers. With millions of clicks, this server load multiplies rapidly.
Because a server has only finite power, it’s imperative to have a collection of servers—server farm—to handle all the user requests. But each server runs separately, so as the load increases, the requests need to be scaled across multiple servers to maintain the same level of performance to the end users. Load balancing is the process that makes it possible to distribute the incoming requests across multiple servers.
If one or more servers goes down, the load balancer must recognize the changed capacity, and redirect requests accordingly. This ability to give a seamless unaffected experience to end users is part of load balancing and is called high availability.
Simply put, load balancing makes it possible to provide a single point of entry for a server farm, but distribute the load across all the servers in the farm. Load balancing is useful not only for HTTP web applications, but also for other applications that use other protocols, such as FTP and chat applications.
Load Balancing Schemes
A typical load-balancing solution has both hardware and software components, but solutions exist that are purely software or hardware based.
- Software-based load balancers run on a server that all the clients connect to. The software listens on a port, and determines which server in the server farm should handle the request. Software load balancers are like a reverse proxy cache (although usually without the cache) that act on behalf of the server, forwarding incoming requests to outgoing servers. This implies that the servers themselves cannot be reached directly by the users. The load balancer decouples the requesting client from contacting the back-end servers directly, in essence providing a layer of security. One open source example product is Apache’s mod_proxy_balancer extension.
- Hardware-based load balancers can be based on routing, tunneling or IP translation. These can be complicated, so if you’re considering that route, you should probably plan to use professional help for setup and configuration.
Load balancers use a variety of methods to select the server to service a given request. The method can be as simple as random choice or a round robin approach; however, more complex solutions, such as those offered by some commercial players, also factor in server load, the current traffic conditions, the server’s proximity to the end user, geographic location, recent response times, etc. For example, if the load balancer knows that a particular server has more load than another (e.g. due to a geographic location), it can assign a ratio so one of the servers gets a greater load.
Many applications require HTTPS (i.e. SSL-enabled connections). These applications are both hardware and software resource intensive because of encryption and decryption overhead. You can mitigate the heavy burden by deploying special hardware for SSL connections that perform the encryption and decryption tasks, reducing the burden on the web servers. Similarly, you can assign one server to handle security (user authentication and authorization), decoupling those tasks and leaving other web servers dedicated solely to handling content requests and responses.
A load balancer can also buffer server responses. When the load becomes high, the load balancer can hand out responses from the buffer or cache, saving server capacity to perform other priority tasks.
The load balancer can use “health checks” to identify downed or overloaded servers. A server’s health can be diagnosed in either near-real-time or on a scheduled basis. Bad servers can be replaced by good ones manually or automatically depending on the requirements and capabilities of the particular solution.
Large-scale portals can have huge issues from hackers, who employ various means to attack sites. For example, programs that continuously poll a URL to implement Denial of Service (DOS) and Distributed Denial of Service (DDOS) attacks can create huge server loads. Load balancers can provide features to mitigate or prevent such attacks.
DNS-Based Load Balancing
The Internet’s Domain Name System (DNS) resolves a domain name into an IP address. Round-robin DNS load balancing is a popular method for load balancing that does not require special hardware or software components. To implement DNS load balancing, it’s essential to have a good understanding of DNS and the DNS resolution process.
Imagine that an end user requests the URL www.iberindia.com from a browser. The browser needs to reach the servers of www.iberindia.com, but it first needs to translate the URL string into the IP address that identifies the iberindia server. DNS translates the domain portion (iberindia in this case) of the URL string into an IP address.
The browser first checks its cache of resolved addresses. If www.iberindia.com was visited recently, the IP address might be available in the browser’s local DNS cache. In this case, the browser will directly make a connection with the IP address in the cache. When the cache contains more than one IP address for www.iberindia.com, the browser will try them in sequence until one succeeds in making an HTTP connection with the server.
|Author’s Note: The preceding scenario is a generic description only, because browsers can behave differently, and browser behavior can be customized. The browser’s DNS cache is different and distinct from the browser’s HTTP content cache. An end user typically has more control over the HTTP content cache than the DNS cache.
When the IP address is not cached, or when none of the IP addresses work, the browser must obtain a fresh IP address. The browser contacts the operating system to resolve the name into an IP address. (The operating system may maintain a DNS cache of its own, and will return any cached IP address.)
If the operating system does not have a cached IP address for the domain, it issues a query. Figure 1 depicts the chain of flow. The local resolver on the machine of the end user issues a query to the root domain name server. In turn, the root servers contact sub-domain name servers until an address is obtained, which is returned to the browser. If no matching IP address is found, the resolver returns an error code.
|Figure 1. Typical DNS Request Flow: Depending on cache availability, DNS requests flow from a client to a local resolver, to a root domain name server, to sub-domain servers, and eventually, back to the client.
With that understanding of DNS in place, let’s go back to the round robin scheme. A suitable model can manage DNS responses to end user DNS requests to engage a set of servers in a round robin fashion. For example, one way to implement this is by responding to a DNS request with multiple IP addresses. When the client requests an IP, the browser tries the first IP in the DNS response. If that doesn’t work, it tries the next IP, and so on. With each successful connection to a server IP, the sequence of the returned IPs also changes. Thus, each time any client requests an IP, it gets a different IP, effectively distributing load across the set of servers.
This technique works to distribute load between both web servers and FTP servers. It is most popular in geographic load balancing, where end users might be spread across different parts of the world. For timely request servicing, the servers might also be located in different parts of the world. Obviously, it’s desirable to service a given request using a server located as close as possible to the requesting machine’s location. You can configure this type of load balancing by providing the end user with an IP list containing the local or near-local server first in the list of IPs returned the DNS request. When more than one server is available locally, you can permute the DNS list (and balance the load) by varying the sequence for the next request.
Unfortunately, the simplicity of this scheme also has a downside. For example, there’s no automatic health check, so if a local set of servers all goes down, the DNS response containing the IP list for that particular location would still contain the downed server IPs. In addition, round robin DNS load balancing doesn’t balance based on load from an end user, but merely on the sequence of end user requests. To overcome these issues, you can implement ways to poll the servers for both availability and load. As you can imagine, such tasks grow in complexity very rapidly, so for large-scale portals, it might be better to rely on a commercial player in this space than using a do-it-yourself (DIY) approach.
When considering a load balancing solution, look for three things: performance, reliability and scalability, then assess the tradeoffs between various solutions based on your requirements and budget.