AWS: Using Redundant EC2 Instances to Increase Availability

AWS: Using Redundant EC2 Instances to Increase Availability

This article is excerpted from Amazon Web Services in Action?published by Manning Publications.

Unfortunately, EC2 instances aren?t fault-tolerant. Under your virtual server is a host system. These are a few reasons your virtual server might suffer from a crash caused by the host system:

  • If the host hardware fails, it can no longer host the virtual server on top of it.
  • If the network connection to/from the host is interrupted, the virtual server loses the ability to communicate via network as well.
  • If the host system is disconnected from a power supply, the virtual server also goes down.

But the software running on top of the virtual server may also cause a crash:

  • If your software has a memory leak, you?ll run out of memory. It may take a day, a month, a year, or more, but eventually it will happen.
  • If your software writes to disk and never deletes its data, you?ll run out of disk space sooner or later.
  • Your application may not handle edge cases properly and instead just crashes.

Regardless of whether the host system or your software is the cause of a crash, a single EC2 instance is a single point of failure. If you rely on a single EC2 instance, your system will blow up: the only question is when.

Redundancy can remove a single point of failure

Imagine a production line that makes fluffy cloud pies. Producing a fluffy cloud pie requires several production steps (simplified!):

  1. Produce a pie crust.
  2. Cool down the pie crust.
  3. Put the fluffy cloud mass on top of the pie crust.
  4. Cool the fluffy cloud pie.
  5. Package the fluffy cloud pie.

The current setup is a single production line. The big problem with this setup is that whenever one of the steps crashes, the entire production line must be stopped. Figure 1 illustrates the problem when the second step (cooling the pie crust) crashes. The following steps no longer work either, because they don?t no longer receive cool pie crusts.


Figure 1.?A single point of failure affects not only itself, but the entire system.

Why not have multiple production lines? Instead of one line, suppose we have three. If one of the lines fails, the other two can still produce fluffy cloud pies for all the hungry customers in the world. Figure 2 shows the improvements; the only downside is that we need three times as many machines.


Figure 2.?Redundancy eliminates single points of failure and makes the system more stable.

The example can be transferred to EC2 instances as well. Instead of having only one EC2 instance, you can have three of them running your software. If one of those instances crashes, the other two are still able to serve incoming requests. You can also minimize the cost impact of one versus three instances: instead of one large EC2 instance, you can choose three small ones. The problem that arises with a dynamic server pool is, how can you communicate with the instances? The answer is decoupling: put a load balancer between your EC2 instances and the requestor or a message queue. Read on to learn how this works.

Redundancy requires decoupling

Figure 3 shows how EC2 instances can be made fault-tolerant by using redundancy and synchronous decoupling. If one of the EC2 instances crashes, ELB stops to route requests to the crashed instances. The auto-scaling group replaces the crashed EC2 instance within minutes, and ELB begins to route requests to the new instance.


Figure 3.?Fault-tolerant EC2 servers with an auto-scaling group and ELB

Take a second look at figure 3 and see what parts are redundant:

  • Availability zones???Two are used. If one AZ goes down, we still have EC2 instances running in the other AZ.
  • Subnets???A subnet is tightly coupled to an AZ. Therefore we need one subnet in each AZ, and subnets are also redundant.
  • EC2 instances???We have multi-redundancy for EC2 instances. We have multiple instances in a single subnet (AZ), and we have instances in two subnets (AZs).

Figure 4 shows a fault-tolerant system built with EC2 that uses the power of redundancy and asynchronous decoupling to process messages from an SQS queue.


Figure 4.?Fault-tolerant EC2 servers with an auto-scaling group and SQS

In both figures, the load balancer / SQS queue appears only once. This doesn?t mean ELB or SQS is a single point of failure; on the contrary, ELB and SQS are fault-tolerant by default.

?

For source code, sample chapters, the Online Author Forum, and other resources, go to Amazon Web Services in Action. Use code wittiged?to get 39% off of the purchase price of the book in any format.

Share the Post:
XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing

data dictionary

Tools You Need to Make a Data Dictionary

Data dictionaries are crucial for organizations of all sizes that deal with large amounts of data. they are centralized repositories of all the data in organizations, including metadata such as