Automated Resource Management of Cloud-Based Applications

Software applications rarely experience a uniform workload. The workload varies by different months in a year, different weeks in a month, different days in a week as well as different time-slots within a day. Increased workload results in variations in the expected behaviour of the applications. For better performance, applications should be capable of handling and normalizing the variations so that users aren’t affected.

Managing resources to optimize performance in a traditional datacenter environment is inefficient because of the delay in reacting to the unexpected behaviour of an application. The reason behind this delay is human intervention to take corrective measures, such as provisioning additional resources to the application. To reduce this delay, and hence to optimize the performance, there should be a way to dynamically manage the resources in the datacenter. This can be achieved by using cloud-based applications which provides us the flexibility to automate the resource management.

[login]

Introduction

In a traditional datacenter environment, dedicated human resources are required to monitor the application and take corrective measures if anything goes wrong. For example, if the workload increases, then someone should introduce one more machine in the environment and add it to the cluster to handle the additional load. This process is time consuming as it needs human intervention. By the time corrective measures are implemented, the workload could be different than expected. So, this approach will work only in scenarios when increase in workload can be predicted well in advance. For this reason, most infrastructures are over-provisioned to handle any spikes in workload. This eventually results in increased maintenance costs and a largely under-utilized infrastructure.

A cloud-based infrastructure helps minimize this cost by providing the desired elasticity to handle workload spikes without the overheads that accompany a traditional data centre environment. To achieve this, it should support automatic management of its resources. In a cloud computing environment, this is achieved by creating new virtual machines (VM) on-the-fly to meet any workload spikes. One can easily scale up/down the infrastructure by creating or terminating the virtual machines. Most web servers available in the market today support clustering and load-balancing capabilities. Some web servers allow ramp down without any loss of information like session information.  Using the appropriate web server speeds up the process and hence the quality of service.

In this article, we explain how to dynamically manage the resource needs of cloud-based applications. We will see how to use the monitored values and decide some action to meet the Service Level Objectives, SLOs. We illustrate how to automatically trigger an action based on monitoring data (coming from monitoring tool) using Drools and then scale up/down the resources using XEN. XEN is virtualization software which allows computer hardware to run multiple guest operating systems concurrently. This form the basis of cloud computing. There are multiple cloud solutions which use XEN, Eucalyptus is one of them. We will use Eucalyptus as provisioning software (Eucalyptus).

Use-case scenario

Let’s take an example of a web application deployed in a cloud environment. Consider that in a particular season, the number of users accessing that application increases. This will in turn increase the load on the servers on which the application is deployed. Hence, the current number of servers won’t be able to handle the increase in workload. If this is left unhandled, this might result in over-utilization of CPU and may gradually degrade the application’s performance. To manage this, we have to add more servers and cluster them with the existing ones so that load is balanced between newly added servers. Similarly, in an off-season, when the load on the application decreases, the resources provisioned for the application should be released gracefully to reduce costs.

Let’s have a look at how we can dynamically manage the resources to stabilize the load on the web application to avoid poor performance.

Solution

Consider that a web application is deployed on a virtual machine containing application server. We have used Eucalyptus to setup a private cloud infrastructure. Eucalyptus is an open source cloud platform from Eucalyptus Systems. It enables enterprises to establish their own cloud computing environments.

Let’s see the steps involved in dynamic scaling up/ down of resources based on server load.

Complete problem solution can be broken down into 3 steps for better understanding:

1.       Monitor the virtual machine for any specific metric.

2.       Optimize the monitored result and suggest corrective measures.

3.       Execute the actions to meet SLOs.

Follow figure 1 for a better understanding.

Figure 1: Dynamic resource management solution.

As we can see from the figure 1, each virtual machine is equipped with a monitoring agent. The Agent’s responsibility is to collect the desired metrics from the VM. This monitored data is then sent to an Optimizer. The Optimizer engine is responsible for smoothening of sudden spikes in workload pattern and deciding which action (increase/decrease) to take based on the values coming from monitoring layer. This result is then sent to an Action engine which actually does the task of increasing or decreasing the virtual machine according to the load.

Let’s look at the solution in detail.

Step 1: Monitoring the virtual machine

Today, we have lots of monitoring tools like Hyperic, Nagios, Cacti, Ganglia, etc. available in the market to monitor physical or virtual machines deployed on the cloud. These tools help collect several useful metrics like CPU utilization, memory used, active thread count, response time and many more. In this use case, we use Hyperic to monitor the CPU utilization of the application servers as it directly impacts the performance of an application.

Figure 2: Monitoring workflow.

As we can see in the figure 2, each virtual machine is equipped with monitoring agents (Hyperic in our case). The monitoring agent will collect data for the enabled metrics and send the data to the Hyperic server. Our monitoring engine then collects the data from the server and sends the same to the optimizer engine. For more details on monitoring engine, please refer to this tutorial.

Step 2: Optimize the monitored result and suggest corrective measures

Once we get the monitored values, we need to process it to convert into some valid data based on which the corrective measure is decided.

As we infer from figure 3, the output of the monitoring engine goes to the Optimizer. The Optimization process is further broken down into two steps :

a.       Averaging the monitored values to remove spikes

b.      Suggesting actions using Drool engine

Figure 3: Optimizer.

a) Averaging the monitored values to remove frequent see-saw variations

Let’s take a use case where we are monitoring the CPU utilization of the application servers on which the web application is hosted. The monitoring tool Hyperic is configured to output the metric data every minute. Values coming from the tool might contain unwanted spikes due to certain reasons. So before reaching to any conclusion, we need to process this data to get the valid metric value. In this example, we are averaging the data every 5 minutes to remove the unwanted spikes.

b) Drool Engine

Once we get the processed data by averaging, we will pass it to the Drool engine. It takes this processed value as an input and suggests necessary action that needs to be initiated. Drool is a rule engine and consists of a simple drool file. We define the rules under which an action needs to be initiated in this file. When the conditions are met, the corresponding actions configured in the file are processed. Below is the drool file and the corresponding java code to invoke it.

import org.apache.log4j.Logger;import org.poc.cloud.Action;rule "Increase VM Rule"       when     	    	$metricValue: Double()    	eval(checkValue($metricValue.getDoubleValue())>60);       then    	logger.info("Cpu-Utilization shoot beyond upper-threshold");	Action action=new Action();	action.increaseVM();                endrule " Decrease VM Rule "       when     	    	$metricValue: Double()    	eval(checkValue($metricValue.getDoubleValue())<30);       then    	logger.info("Cpu-Utilization goes below the lower-threshold");	Action action=new Action();	action.decreaseVM();                endfunction double checkValue(Double num1) {	num1=num1*100;  	return num1;}

Figure 4 : Optimizer.drl file

Java code to call rule adapter which will internally invoke drools file.

	RuleInput ruleInput;	ruleInput=new RuleInput();ruleInput.setMetricInputValue(0.62);//averaged value as an input to the optimizerruleInput.setMetricName("CPU-Utilization");	RuleAdapter ruleAdapter;		ruleAdapter = (RuleAdapter) getContext().getBean("ruleAdapter");				ruleAdapter.runRules(new String[] {"Optimizer.drl"},					new Object[] {ruleInput});

Figure 5: Java code to call rule adapter

import org.drools.definition.KnowledgePackage;Import org.drools.io.ResourceFactory;import org.drools.runtime.StatefulKnowledgeSession;public class RuleAdapter {    static Logger logger = Logger.getLogger(RuleAdapter.class.getName());    public RuleAdapter() {    }    public void runRules(String[] rules,                         Object[] facts) throws Exception {        KnowledgeBase kbase = KnowledgeBaseFactory.newKnowledgeBase();        KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();        for ( int i = 0; i < rules.length; i++ ) {            String ruleFile = rules[i];            logger.info( "Loading file: " + ruleFile );            kbuilder.add( ResourceFactory.newClassPathResource( ruleFile,            		RuleAdapter.class ),                                  ResourceType.DRL );        }        Collection pkgs = kbuilder.getKnowledgePackages();        kbase.addKnowledgePackages( pkgs );        StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession();        for ( int i = 0; i < facts.length; i++ ) {            Object fact = facts[i];            logger.info("Inputting values to drl file: " + fact );            ksession.insert( fact );        }        ksession.fireAllRules();    }}

Figure 6: Rule adapter class which will invoke drools file.

The RuleAdapter class will invoke the rule file with RuleInput object as an input to it. The RuleInput object is composed of metric value and metric name property.

Optimizer.drl file:

This drool file contains two rules:

    1. Increase VM Rule -- When the metric value exceeds the upper-threshold level.
    2. Decrease VM Rule -- When the metric value crosses the lower-threshold level.

1) Increase VM Rule During peak hours when the load increases, the CPU utilization also increases. Over-utilization of CPU would results in poor performance which might drift away the users from the application. Therefore, we should be able to handle such situations to meet the SLOs of the applications. We should add more application servers in the environment after utilization crosses the permissible value.  This permissible value is configured in a drool file. Let’s assume the threshold value for utilization is 60%. So in a drool file, we have specified a rule to add one virtual machine when the metric value goes beyond 60% by calling increaseVM() action.

2) Decrease VM Rule: During off-peak hours when the load is very low, the servers would be under-utilized. If the average utilization of servers goes below a specified threshold, virtual machines could be released to allow other instances takeover its load. Here, we have configured lower threshold value for utilization as 30%. So, in drool file we have specified a rule to decrease one virtual machine whenever the metric value goes below 30% by calling decreaseVM() action.

Step 3: Action Execution

Based on the metric value, an action is invoked by the drool file according to the thresholds set. If CPU utilization goes beyond certain level, then we should add one or more VMs to handle the load. Similarly, if it goes below certain threshold then extra VMs should be removed.

Figure 7: Action Engine.

Based on the optimized value, the Optimizer suggests an action to the Action engine. To accomplish the same, the actions interact with the Eucalyptus (using its API) to create or destroy virtual machines accordingly (refer to figure 7).

Let’s dive in to the code to see how to achieve the same programmatically. We can see from Figure 4( Optimizer.drl file), there is an Action class which gets invoked based on the value. There are increaseVM() and decreaseVM() methods in that class.

Before making calls to Eucalyptus API, we need to set some system variables.

System.setProperty("euca.var.dir","PATH_TILL_FOLDER-var");System.setProperty("euca.conf.dir","PATH_TILL_FOLDER-cloud.d");

Figure 8: Setting System variables.

increaseVM() -- This method is called when the value crosses the permissible upper threshold limit. It creates one more virtual machine.

public void increaseVM(){ String eucaURL = "http://10.66.127.30:8773/services/Eucalyptus";ClientPool cp = Defaults.getOneoffClientPool(eucaURL);	BasicClient client = (BasicClient) cp.borrowObject();					EucalyptusMessage request = new RunInstancesType();									request.setUserId("admin");	request.setProperty("kernelId", "KERNEL_ID_OF_EUCALYTPTUS_SYSTEMS");	request.setProperty("instanceType", "SIZE_OF_INSTANCE_TO_CREATED");	request.setProperty("keyName", "KEY_NAME_TO_ADD_MORE_SECURITY");	request.setProperty("imageId", "IMAGE_NAME");	request.setProperty("minCount", "MINIMUM_COUNT_OF_VM_TO_BE_CREATED");	request.setProperty("maxCount", "MAXIMUM_COUNT_OF_VM_TO_BE_CREATED");	request.setProperty("addressingType", "ADDRESSING_TYPE");							EucalyptusMessage reply = client.send(request) ;	System.out.println("RunInstancesType reply = " + reply);}

Figure 9: increaseVM() method code snippet.

The code is self-explanatory. To create a new VM, we have to follow the above code and pass some parameters.
'Instance Type' refers to size of the VM to be created. Some of the allowed values are - 'm1.small', 'c1.medium',' m1.large', etc.
We can create more than one VM in single request by specifying the maximum count.
decreaseVM() -- This method is called when the value goes below the lower threshold. In this case, we terminate a VM.

public void decreaseVM(){String eucaURL = "http://10.66.127.30:8773/services/Eucalyptus";ClientPool cp = Defaults.getOneoffClientPool(eucaURL);	BasicClient client = (BasicClient) cp.borrowObject();	EucalyptusMessage request = new TerminateInstancesType();	List instancesToBeTerminated = new ArrayList();		instancesToBeTerminated.add("VM_INSTANCE_ID_TO_BE_TERMINATED");	request.setProperty("instancesSet", instancesToBeTerminated);	EucalyptusMessage reply = client.send(request) ;	if((Boolean)reply.getProperty("_return")){		System.out.println ("Selected Instances terminated successfully!");	}else {		System.out.println ("Selected Instances not terminated !");	}	System.out.println("TerminateInstancesType reply:: " + reply);}	

Figure 10: decreaseVM() method code snippet.

In this method, we pass the instance id of the VM to be terminated as a EucalyptusMessage object. Here again, we can terminate more than one VM in a single request.

We have used Hyperic for collecting data, but there are lots of other tools in the market.

Similarly, for cloud computing environments, there are providers like VmWare, Platform Computing, etc. who offer features like hot update of VM, hot migration of VM from one host to another, etc. Appropriate provider can be chosen based on requirements.

Conclusion

Here, we have illustrated how we can achieve a simple dynamic resource management in a cloud computing environment. Increasing or decreasing the VM count dynamically based on CPU utilization is one of the solutions for managing the resources dynamically. In real scenarios, the rules for increasing or decreasing the VM count will be more complex and take a multiplicity of factors into consideration. Further, there could be  several other reasons for poor performance of applications like increased thread count, heap memory size, poor application design, network overheads,  etc. which can be optimized by taking appropriate actions like pooling valuable resources, optimizing I/O operations and other solutions.  Mere addition of Virtual Machines instances will not solve such problems. Therefore, finding an optimal solution is a big challenge in itself and should be well researched before finalizing any action.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: