by Leon Fayer
Every web-based application is part of a much broader ecosystem; it is an integral component of a larger business, operates within a highly complex and interconnected technical environment, and aspects of its management are dispersed across multiple organizational units. It is designed and built by developers, and operated by systems administrators to meet specific objectives set by the business. Determining that the application is operating effectively within this environment requires a monitoring strategy that meets the needs of multiple constituents and goes beyond simply tracking technical metrics for uptime, response times and capacity.
Case 1: The Impact of Ineffective Monitoring
A retailer’s website displays content-appropriate ads sourced from an ad network on its product pages. Business analysts monitor the number of completed sales transactions, while operations monitors uptime and server performance. Due to inefficient application design, these ads load ahead of the retailer’s own content as the page builds. This design is not an issue as long as the ad network is performing well. But one day, the business analysts notice a sharp drop in completed transactions. A quick call to the systems administrator reveals that it’s not an operations issue; the web and application servers are up and performing properly. The underlying problem is that performance issues on the part of the ad supplier are slowing page loads to the point that buyers are abandoning transactions. (see Figure 1) Unfortunately, this aspect of performance is not monitored. The website appears operationally fine, but the retailer is losing revenue.
Figure 1: Effect of third-party ad service latency on website load time
Challenge 1: Aligning Objectives
In many businesses, multiple organizations oversee the operation of a given web application. Systems administrators manage the health of the system environment, developers are responsible for application health and business analysts ensure business goals are met. Each of these areas has its own focus and objectives, relies on its own metrics to monitor that goals are met, and may have very little insight into how its activities impact other aspects of the application’s management. For instance, systems administrators in the operations area may have little knowledge about the application-specific functionality or how changes to the systems environment could affect that functionality. Similarly, business personnel seeking to better achieve their own goal, such as increasing registrations, may unleash a promotional campaign that overwhelms system capacity. More importantly, as shown in Case 1, ensuring the application is operating to one organization’s goal does not necessarily mean the application is meeting its larger business objectives. Unless monitoring efforts are aligned, the end result is inefficient teamwork and cross-departmental finger pointing.
Challenge 2: Ensuring Information Is Actionable
While collecting basic capacity, network traffic and application performance metrics are essential for effectively managing a web application, these metrics provide only a partial picture of the application’s true status to its constituents and are rarely truly actionable on their own. For example, if the volume of hits for a given web page suddenly goes up or down; what does that mean? Does that change have any business impact? Is it indicative of a technical issue or the effects of a social media campaign? Is it merely interesting or does it require immediate attention? And if so, what is the proper course of action? Answering these questions requires context that can be supplied by other monitored metrics. Knowing that the network, server and storage environments were operating properly at the time, that applications modifications were deployed just prior to the change, and that completed business transactions simultaneously dropped, supplies the necessary context to quickly answer these questions.
Challenge 3: Focusing on the Right Information
Collecting multiple business and technical metrics helps to ensure the necessary information is available when needed for troubleshooting or to follow trends. Ongoing monitoring can, and should, collect large volumes of data for analysis. Further, any metric worth reviewing should also be used for graphing and trending. Correlations between events are not always obvious in real time, but become apparent when viewed historically. Access to historical data also presents a clear picture of what “right” looks like, making it far easier to determine when and what is going wrong. Understanding the relationships between business and technical metrics helps to determine which metrics truly matter. Further, it accelerates troubleshooting by enabling issues identified through a business metric to be mapped to the related technical metrics (and vice versa). More often than not, as shown in Figure 2, interpreting the reason behind a change in one metric requires following a chain of impact through multiple metrics.
Figure 2: The Relationship Between Monitoring Metrics and Organizational Views
An effective monitoring program addresses the challenges described above by taking a holistic view of web applications, their environments and the businesses they serve. It does so by capturing, relating and displaying metrics relevant to its multiple stakeholders in a way that provides a “big picture” of overall performance as it impacts the business. This approach supports the individual needs of developers, network engineers, administrators, business analysts and executives, while providing the context to align their actions to a common set of business objectives. Whether merging existing initiatives or launching a new monitoring program, the following four steps will enable web businesses to significantly increase the value gained from their monitoring efforts.
Broaden the Scope of Your Monitoring Program
The first step toward establishing an effective monitoring program is to collect and combine business and technical monitoring metrics. Expanding existing technical monitoring programs to include full coverage of critical business metrics ensures your web-enabled business, as well as the technology that supports it, is functioning properly. For example, a web business that relies on site registration (beta sign-up pages, membership sites), may want a business check to make sure the hourly number of registrations does not drop below a set threshold. Similarly an e-commerce application could benefit from a business metric that monitors credit card transaction success vs. failure ratio to ensure the sales process works as expected. Metrics such as these not only support monitoring and troubleshooting efforts, they supply considerable business intelligence to enhance decision-making and optimize business performance going forward.
Set Business Objectives
The primary focal point of a holistic monitoring initiative is ensuring that web applications are efficiently and effectively achieving their business goals. Clearly identifying these objectives establishes priorities, enables monitoring metrics to be evaluated based on the contributions to the business, and aligns organizational actions to a common goal. It focuses discussions about applications, servers, web properties and other infrastructure/systems, first and foremost on making sure that business goals are satisfied. Questions to ask when determining an application’s business objectives include:
- What specific benefits/results is the application supposed to deliver for the business? (Build a list of objectives.)
- How do you determine if those benefits/results are being achieved? (Identify measures for each objective.)
- What are the current and future performance expectations for each objective? (Set monitoring and trend targets.)
Map Business and Technical Metrics
Once the business objectives are determined, the next step is to identify the technical metrics that affect the associated business metrics. Mapping the relationships between technical metrics and business objectives adds context and highlights the mutual impacts. For instance, the credit card transaction success vs. failure ratio measured by an e-commerce business depends on the availability of the application server, connectivity to a third-party authorization service, and availability and performance of that service itself. If an outage of the authorization service occurs, it will translate into a higher ratio of card transaction failures. Conversely, a higher ratio of card transaction failures may be caused by an issue highlighted by one of the associated technical metrics. Complete this step by determining the sources for the metrics to be monitored. They may be available from one or more existing monitoring tools or may require obtaining a new tool or monitoring service.
Implement a Dashboard
The final step in the process is to assemble a dashboard to support effective monitoring. To create a holistic view, pick the most important business and technical metrics to display together. Seeing multiple metrics simultaneously provides a quick view of overall system health as well as supplying the context needed to rapidly chase down issues. (see Figure 3) Following the previous example, seeing an increase in card transaction failures along with network connectivity issues with the third-party authorization service prioritizes and directs remediation efforts by quickly ruling out other potential issues. Using charts and graphs on the dashboard to display trending information provides additional means to identify and remediate issues. For example, rising transaction rates over time can provide early notice of future capacity constraints. Or knowing that a web application averages 200 transactions an hour on a typical day with a normal range between 110 and 350 transactions will trigger an analyst to research a spike of over 500 transactions in a given hour.
Figure 3: A Holistic Monitoring Dashboard
Monitoring in Action
Case 2: Identifying problem before it happens
One of the world’s largest entertainment websites was relying on third-party software to power its customer support workflow. The vendor released a minor patch to the software, and IT performed a routine upgrade to the system. After the installation of the patch, all the regression tests passed, validating software readiness and functionality. However, looking at the monitoring dashboard, the trends showed that the database connections from the upgraded software significantly increased the load on the database, running a risk of impacting other business core operations in the near future, if running at the same rate. (see Figure 4) Tens of millions of users could be affected, and significant portion of the revenue could be lost as a result. Following its contingency plan, IT rolled back the patch and notified the vendor about a critical bug in the software. The patch was hot-fixed, correcting the issue, and went to production without impacting any of the end users.
Figure 4: Increase in database load cause by the software upgrade
Case Study 3: Holistic Troubleshooting in Action
A large e-commerce company operating a very complex system with multiple revenue generation points, supporting 80 million users and 1 billion annual transactions, discovered that revenue dropped lower then the projected trend. Fortunately, the company’s monitoring software supplied sufficient data to target troubleshooting. Reviewing a trend graph of revenue showed a clear drop in revenue beginning on a specific date. Combining this graph with web traffic trends showed a simultaneous drop in traffic. (see Figure 5) This information indicated that the revenue drop was associated with the traffic drop, rather than to application or payment processor issues that prevented customers from completing transactions.
The next step applied trend data on load times, database health and CPU usage. These metrics did not deviate from their norms, enabling the troubleshooter to eliminate the underlying platform as an issue without the need to perform deep dives into the health of individual components. With systems issues eliminated, the next action was to evaluate marketing campaign performance.
The company sends out tens of millions of emails a day to attract new users, and subsequently, generate new conversions. If the campaigns are ineffective or the emails fail to reach their targets, traffic to the site slows, leading to a decrease in the number of transactions. Examining a trend graph of email bounce rates showed that bounce rates skyrocketed at the same time as the drops in traffic and revenue occurred. Closer investigation revealed that one of the major ESPs accidentally blocked the company’s delivery domain, preventing the emails from reaching their intended recipients. The issue was resolved (after some discussions with the ESP) and the trends returned to the expected level. The ability to merge and compare trends across business and technical domains enabled rapid resolution of the issue and with little wasted effort across teams.
Figure 5: Holistic view of system performance
Effective monitoring is essential for running a successful web business. It improves business performance by accelerating response to issues and opportunities that arise from web application operations, helping your company get and keep customers, boost revenue and build brand reputation.
Regardless of their role, everyone responsible for the success of the business needs the ability to assess its status at any given point. Adopting a holistic approach to monitoring that integrates business and technology goals, and metrics provides executives, analysts and engineers with a clear picture of how the entire business is operating. It also provides invaluable data on trends and component interactions to guide planning, troubleshooting and strategy optimization. While system engineers don’t need to understand the details of marketing, they should be aware of their company’s marketing objectives and how the web applications they support contribute to, and are affected by, those objectives. Likewise, the CEO doesn’t need to know how the web applications work in the background, but should be able to correlate the importance of key operating metrics, such as email bounce rates for an e-commerce marketing business, and their impact on costs, revenue and market perception.
While almost all web businesses perform some level of monitoring, companies would benefit by adopting a broader, more sophisticated and proactive monitoring strategy. Use the approach recommended in this paper to determine the business objectives, measures and thresholds that define the success of your web application and will drive your monitoring strategy. Create a dashboard that combines this business and technical information to produce a visually impactful, holistic view of your web business performance. Review existing web applications to ensure monitoring is sufficient and used effectively. If your current sources of monitoring data are insufficient, research, acquire, learn and deploy the right set of monitoring tools to support your new guidelines. When developing new web applications, incorporate the design and construction of business and functionality monitors within the scope of the projects to focus efforts on the most important success measures and maximize the benefit of monitoring efforts once the application is deployed.
A holistic, business-oriented approach to monitoring doesn’t supplant the need for detailed metrics to support drilldown into specific components; rather it provides a framework for seeing those metrics as part of a larger picture. With the right monitoring tools, the blind men described in the introduction could correctly understand the elephant and, certainly, assess its overall health!
About the Author
Leon Fayer is Vice President, Business Development for OmniTI. His expertise lies in both web application development and production deployment, including designing effective CMS with varied roles and permissions. Prior to joining OmniTI, Leon worked on projects for enterprise level clients, and the Federal government?including the White House–leading teams through a series of CMS implementations. Leon was one of the architects behind what is now IBM’s premier enterprise content management platform, and has led teams through the architecture, design and development of CMS, CRM and workflow systems for very large entertainment, media and sports event clients. He can be contacted at [email protected].