Plan Your Application Monitoring Strategy
Traditionally, monitoring and capturing diagnostic information about an application’s behavior has been a development exercise that involves writing information to a log file or publishing it to the system event log. The development team is responsible for deciding what information to collect. In this scenario, organizations rely on end users or QA staff for problem detection and notification, and log files provide the diagnostics.
Lowering the TCO for an application requires moving beyond this method of monitoring to a proactive application performance management approach. To do this, it is vital to understand how collected information is interpreted throughout the problem resolution process and to standardize the presentation of that diagnostic information across all applications. Using a standard platform for monitoring and data collection throughout the application lifecycle reduces TCO. In addition, using a standard monitoring and data-collection platform shortens development cycles, because developers are able to focus less on instrumentation and more on core application logic. Time-to-market is improved as well, due to reduced cycles in development and QA.
Imagine an application that requires access to a file on the fileserver. What happens if the IT department changes a security policy and causes an "access denied" error? The health model rules will note an application state of "failed" and automatically notify the appropriate team within the IT department. Additionally, it would typically collect supporting diagnostic information that help indicate the type of problem, specific information about this particular instance of the problem, and steps required to resolve the error. The diagnostic information collected would likely include the specific file being accessed, the security error, and the precise permissions required to restore normal application behavior.
This example illustrates how the health model enables management of well-known potential application problems. However, this approach can be costly from both a design and development perspective because it does not necessarily accommodate unanticipated problems. A more cost-effective and proactive approach is possible by marrying a health model with an always-on application monitoring solution—one that provides 24/7 detection and diagnosis of both expected and unexpected application problems.
To improve the problem resolution process, ensure optimal application availability, and lower the total cost of application ownership, a monitoring environment should provide "roll-up" capabilities that combine application events and state transitions to deliver an overall view of both application performance and the health state of individual servers, services, and applications.
For example, a business application is likely to depend on at least four separate areas of functionality: a data tier, application tier, interface tier, and utility services such as Active Directory, DNS, and networking. Suppose the application encounters an error stemming from a problem with a database server that became corrupt or ran out of memory. The health model would indicate a "failed" state for the application. If the IT operator cannot see the performance data and error messages relating to the database server, it will be difficult to diagnose the problem accurately. A similar type of problem occurs when an application component relies on a web service exposed by another organization for its source data. The health model will indicate a "failed" state even if it is the web service that has failed, and is not due to the application.
Integration With Problem Management Workflow
Integration is the next requirement. New applications and monitoring solutions should integrate into existing incident and problem management workflows. Most monitoring solutions can provide application information in a format compatible with existing formal processes. A monitoring solution should be flexible enough to fit the needs of individual teams. Developers or operations staff should have formalized communication channels, so bug reports and feature requests undergo a strictly controlled process. Seamlessly integrating new applications into existing management systems, structures and methods minimizes application downtime and circumvents communication problems that plague companies and leave application problems unresolved.
As code is developed, the information in the health model assists in locating faults and accelerates development, thereby reducing schedule risk, and lowering development costs. In addition, when the code fails, developers can use the resulting state changes to quickly locate the fault and determine the root cause. The health model contains all the state changes, so developers are better assured that their instrumentation will detect any errors.
Adhering to best practices and principles for designing and developing applications depends on establishing an accurate and comprehensive monitoring solution. This solution should provide coverage for the entire application, including its dependencies on other services and components, and it should contain the knowledge required to diagnose, resolve, and ensure the resolution of application errors.
Like the application, the monitoring solution should also evolve as business and end-user needs change. When adding new features or code to an application, the health model should reflect those changes. The monitoring environment should also evolve as the application and business needs evolve. Monitoring solution evolution might include new server discovery rules, changing roll-up rules and settings, or even adding completely new rules and alerts. By ensuring that both the application and the monitoring solution are scalable and can evolve, organizations realize greater return on their application development and management investments.
Organizations today are looking at the same thing they always have: return on investment (ROI). In the application development world ROI is tightly linked to TCO. Efforts to shorten time to ROI and lower TCO pose their own risks and challenges, such as how to maintain and manage applications cost effectively as both end user and business requirements change and potentially render the application obsolete. Implementing the correct policies and application performance monitoring solutions are proven strategies for mitigating risk and helping organizations reach the goal of the often elusive ROI.