Statistic Derivation
Once the
MetricCollector has created a metric, it is immediately passed to a calculator to process. Calculators are discrete components that derive and maintain statistic results. Calculators are responsible for managing their own data. Create one calculator instance per statistic.
Define a common calculator interface. The interface formalizes the collaborations between calculators and MetricCollector, constrains calculator behavior, and allows new types of calculators to be added transparently. The Calculator interface should at a minimum provide a way to apply metrics and to access the results. Only so many different types of calculators are possible, so once you have the basic set established (i.e., count, minimum, maximum, and average) you just reuse across statistics:
Create a base calculator interface:
public interface Calculator {
..
public String getName();
public void applyMetric(Metric metric);
public void reset();
public double getResult();
public long getTimestamp();
}
The UserCount statistic uses a runningcount calculator. Create a RunningCount calculator class that implements the calculator interface. The MetricCollector will load it when it starts up. Inside, it maintains a runningcount. The count will be incremented on receipt of a login event and decremented on receipt of a logout event:
public class RunningCount extends Count {
..
public void applyMetric(Metric metric) {
setTimestamp(metric.getTimestamp());
if (metric.getFirstReading() != null) {
count = count + 1;
}else {
count = count -1;
}
}
}
Statistic Reporting
You now have all the building blocks in place to intercept logs, generate metrics, and derive statistic results. You can easily add new statistics and calculator types via configuration. All that remains is to run your application and do something useful with your statistical data.
Statistic results can be stored in the DBMS for later analysis, logged, or forwarded on to other systems to manage or visualize. Dumping results to logs isn't particularly useful for real-time monitoring, as it requires someone or something to watch the logs for events of interest. For production systems, the real payoff is alerting: the ability to alert operators in real time of issues or events that indicate production problems.
Alerting
The simplest form of alerting is threshold-based alerting. Alerts are generated when statistical values equal or exceed predefined limits. For example, when the UserCount >= 100, or when memory is low, or when the error rate exceeds a certain threshold. To extend the log interceptor so that it supports alerting, pre-configure your alerts. An Alert must specify a statistic and an alert threshold rule. Extend the MetricCollector to evaluate each alert when a metric is generated. If an alert rule fires, then a notification can be generated and sent to a special log4j alerts category. It's then a simple matter to redirect all alert notifications (using existing log4j appenders) to the appropriate channel: stdout, file, email, NT Event Log, JMS, or some other target:
protected void calculateStatistic(Calculator calculator, Metric metric) {
calculator.applyMetric(metric);
Alert a = (Alert) alerts.get(metric.getStatisticName());
if (a != null) {
a.evaluateAlert(calculator.getResult());
}
}
Configure a UserCount alert for when the number of users exceeds 100:
alert.MaxUserLimit.description=Maximum Number of users reached
alert.MaxUserLimit.statistic=UserCount
alert.MaxUserLimit.warn= >100
alert.MaxUserLimit.category=alerts
To send alerts to the NT event logger, add the appropriate log4j configuration for the alerts category.
Rates—Error Rates
To illustrate another statistic, define an ErrorRate statistic for measuring the number of error occurrences over a period of time (e.g., error rate per second). The filter trigger for the statistic is any log message that contains the text "Exception". Use the Rate statistic calculator:
statistic.ErrorRate.description=Errors per second
statistic.ErrorRate.calculator=Rate
statistic.ErrorRate.first.match=.*Exception.*
Create a Rate calculator that implements the Calculator interface:
public class Rate extends CalculatorAdapter {
public static final long DEFAULT_PERIOD = 60000; // 1 minute (in millis)
protected long starttimestamp;
protected double rate;
protected long period;
protected long occurrences;
...
public void applyMetric(Metric metric) {
occurances = occurances + 1;
if (metric.isSingle()) {
double elapsedTime = metric.getReading(Unit.TIME) - starttimestamp;
double divisor = elapsedTime / period;
rate = occurrences / divisor;
setTimestamp(metric.getTimestamp());
}
}
public double getResult() {
return rate;
}
}
Whenever an exception is detected in a log message, the Rate calculator will recalculate the ErrorRate. If you want to detect significant rate changes, define an error rate alert. For example, alert the operator via an email whenever the error rate exceeds 10 errors per minute:
alert.ErrorRateIncrease.description=ErrorRate Increase
alert.ErrorRateIncrease.statistic=ErrorRate
alert.ErrorRateIncrease.warn= >=5
alert.ErrorRateIncrease.critical= >=10
alert.ErrorRateIncrease.category=alerts
Fine-grained Control Over Logs
Log interception is just one approach to application instrumentation. The advantage of log interception is that it is real-time, simple, pure Java—and relatively lightweight. With so few classes or extra processing, it is a production-friendly approach. By leveraging log4j you have fine-grained control over which logs get processed and at which level. Measurement can be turned on and off with one line of configuration.
From the single JVM monitoring solution shown here, you can scale up for enterprise/distributed architectures. You can capture metrics from multiple JVMs and channel them to a centralized server or push them to a UI via RMI or JMS.
The downside is that log interception will only ever be as good as the data that's logged—garbage in, garbage out. However, for applications with information-rich logs, log interception is a powerful yet simple way to detect bottlenecks and forewarn of production issues.