OpenTSDB Package Installation and Extracting Time Series Data Points

OpenTSDB Package Installation and Extracting Time Series Data Points

OpenTSDB is a specialized database to store sequence of data points generated over a period of time in uniform time intervals. It uses HBase as the underlying database in order to handle huge amounts of data. This article addresses the challenges faced by developers during the build and set up process. It also explains how to leverage OpenTSDB’s HTTP APIs to develop client programs so that users can create their own user interface for charts and graphs without depending on standard features provided by OpenTSDB.

OpenTSDB

OpenTSDB Read and Write Path Architecture

Make sure Gnuplot installation supports PNG images, if not, add the Gnuplot dependent libraries for PNG support.

[[email protected] ~]$ tar xzf gnuplot-4.4.2.tar.gz[[email protected] ~]$ cd gnuplot-4.4.2[[email protected] gnuplot-4.4.2]$ configure --prefix=/usr/lib/gnuplot-4.4.2[[email protected] gnuplot-4.4.2]$ make install[[email protected] gnuplot-4.4.2]$ ln -s /opt/gnuplot-4.4.2/bin/gnuplot /usr/bin/gnuplot

To test the successful Gnuplot installation, type gnuplot command and it should open the Gnuplot terminal.

Installing HBase

Data put in OpenTSDB is stored in HBase. So HBase needs to be installed before OpenTSDB is installed. To create HBase tables from TSD, HBase should run first. OpenTSDB can support both single node HBase instance and full cluster set up. Follow the instructions for setting up HBase.

Installing TSDB

Locate and Extract the opentsdb tar file

[[email protected] ~]$ tar xzf opentsdb-1.0.0.tar.gz [[email protected] ~]$cd opentsdb-1.0.0

Run the build file to install the Opentsdb

As the build script executes; it creates build directory, temporary (tsd.tmp) directory, static root directory etc. The build script automatically compiles all the files and deploys the package into net directory. If compilation succeeds then run the install command from the build directory.

[[email protected] opentsdb-1.0.0]$build.sh+ test -f configure+ test -d build+ test -f Makefile+ ../configure ??. etc.[[email protected] build]$make install 

Starting TSD (Time Series Daemon)

Once OpenTSDB installation is completed successfully, start OpenTSDB. There are four flags whose values need to be passed for starting TSD [–port, –staticroot,–cachedir, –zkquorum].

  • Port: By default TCP listen to this port 4242
  • Staticroot: Specify the web root from which to serve static files (/s URLs)
  • Cachedir: Create a temporary directory for caching and performance, tsd.tmp directory is created when the build script executes. You can specify the tsd.tmp directory or else you can create a new directory, under which results of the requests will be cached.
  • Zkquorum: Optional flag, if the zookeeper is running in single instance, specify the host name or comma separated name of the zookeeper ensemble
[[email protected] build]$tsdb tsd --port=4242 --staticroot=/hadoop/opentsdb-1.0.0/build/staticroot/ --cachedir="tsd.tmp" --zkquorum="webhost, webhost1" 

The above command can be stored in a shell script, e.g. tsdb-start.sh, and can be optionally executed with nohup so that OpenTSDB will keep running even if the session dies.

[[email protected] build] $ nohup tsdb-start.sh & 

Once the script executes successfully, OpenTSDB is ready to serve. The web-based user interface can be accessed through: http://:4242

Generating TSDB tables in HBase

HBase tables need to be created before loading any metric data through OpenTSDB. The create_table.sh script located in the src directory under opentsdb can be run to create the required tables.

[[email protected] src]$create_table.sh 

By default, LZO compression is enabled in the script. If we run the script with the LZO option the required jar file needs to be present in HBase lib directory. In a production environment, it is recommended to use LZO compression. Otherwise, for testing purposes, the option can be set to none.

Creating Metrics for HBase Schema

Once the tables are created in HBase, we will need to register the metric names for which time series data will be added. The mkmetric command can be used for registering metric name.

[[email protected] build]$tsdb mkmetric total.bytes.sent total.bytes.receivedmetrics   total.bytes.sent   : [0, 0, 1]metrics   total.bytes.received: [0, 0, 2] 

Self-metrics for OpenTSDB can be registered as shown below.

[[email protected] build]$ vi create_metrics.shecho stats | nc -w 1 webhost 4242 | awk '{ print $1 }' | sort -u | xargs  tsdb mkmetric[[email protected] build]$ create_metrics.shmetrics tsd.compaction.count: [0, 0, 1] ..mertics tsd.uid.cache-size: [0, 0, 22] 

Loading Metrics Data

There are many ways to collect time series data on particular matrices. We can collect the metrics using the tcollector system, custom client program, or commands can be used to load bulk data from compressed flat files. The stats command can be used to collect metrics on OpenTSDB itself. Here is the sample script to collect the data from TSDB stats metrics. This script collects the stats metrics at 5 second intervals and load it into the tsdb with the help of a put command.

The standard format for loading data points into tsdb, below, shows some sample data points put into tsdb, “tsd.rpc.received 1360752045 773390 host=webpresto”.

OpenTSDB Read and Write Path Architecture
Figure 2: Bulk loading data points using command-line batch import from files

The file should contain the data in the format of (metric timestamp value tags [tagK, tagV]). E.g. [tsd.http.latency_50pct 136436594 56 type=all host=webhost etc.]. If the data file is huge, it is recommended you compress it using GZip format.

[[email protected] build]$ tsdb import loadmetrics_datapoints.gz 

Bulk loading data points using OpenTSDB TextImporter.java from files

Set the common TSDB options and set the file path to import. Run the TextImporter.java to load metric data points

final class CliOptions {static {	InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory());	}	/** Adds common TSDB options to the given {@code argp}. */	static void addCommon(final ArgP argp) {	argp.addOption("--table", "tsdb","Name of the HBase table where to store the time series"+" (default: tsdb).");	argp.addOption("--uidtable", "tsdb-uid",				"Name of the HBase table to use for Unique IDs (default: tsdb-uid).");	argp.addOption("--zkquorum", "127.0.0.1",				"Specification of the ZooKeeper quorum to use (default: localhost).");	argp.addOption("--zkbasedir", "/usr/lib/hbase/",				"Path under which is the znode for the -ROOT- region (default: /hbase).");}} 

Loading OpenTSDB self-metrics

[[email protected] build]$vi collect_metrics.sh INTERVAL=5IPADDR=127.0.0.1PORT=4242for ((  i = 1 ;  i <= 1000;  i++  )) dowhile :; doecho stats || exitsleep $INTERVAL

Every 5 seconds, the script will collect the data points and send them to the TSD.

[[email protected] build]$ collect_metrics.sh

HTTP APIs for Getting Time Series Data Points

OpenTSDB comes packaged with a web-based UI for accessing time series data and generating graphs. Often users may want to have their own custom UI and charting solutions. OpenTSDB provides a set of HTTP-based APIs so that any application can invoke queries and retrieve OpenTSDB data points and draw their own graphs. We have added a sample java client to read the OpenTSDB data point for certain metrics. Data retrieved from OpenTSDB is a list of timestamps and data points associated with the timestamp for given metrics.

OpenTSDB Read and Write Path Architecture
Figure 3: TimeSeriesMetricVO

TimeSeriesMetricVO holds the list of TimeSeriesReords for a particular metric. We have created a sample java client to fetch the time series metric details. Please modify the opentsdb.properties files with the opentsdb server URL before running the application. The metric name and the dates in the TestClient.java will need to be changed as necessary.

package com.opentsdb.client;import java.io.BufferedReader;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import java.text.SimpleDateFormat;import java.util.Date;import java.util.Properties;import org.apache.http.HttpEntity;import org.apache.http.HttpResponse;import org.apache.http.client.HttpClient;import org.apache.http.client.methods.HttpGet;import org.apache.http.impl.client.DefaultHttpClient;import com.opentsdb.client.beans.TimeSeriesMetricVO;import com.opentsdb.client.beans.TimeSeriesRecord;public class OpenTSDBClient {	public static final String DATE_FORMAT = "yyyy/MM/dd-HH:mm:ss";	public TimeSeriesMetricVO getTSDBMetricVO(OpenTSDBQueryParameter parameter) throws Exception {		HttpClient httpclient = new DefaultHttpClient();		HttpGet httpget = new HttpGet(getURLForMetricsData(parameter));		TimeSeriesMetricVO timeSeriesMetricVO = new TimeSeriesMetricVO();		HttpResponse response = null;		HttpEntity entity = null;		InputStream instream = null; 		try {			response  = httpclient.execute(httpget);			entity = response.getEntity();			if(entity!=null) {				instream = entity.getContent();				BufferedReader reader = new BufferedReader(new InputStreamReader(instream));      	      				for (String line; (line = reader.readLine()) != null;) {					TimeSeriesRecord record = getTimeSeriesRecord(line);					if(record!=null) timeSeriesMetricVO.addRecord(record);				}      	         			}		} catch (Exception e) {			throw new Exception(e.getMessage());		} finally{			instream.close();			httpclient.getConnectionManager().shutdown();		}			timeSeriesMetricVO.setMetricName(parameter.getMetricName());		return timeSeriesMetricVO;	}	private TimeSeriesRecord getTimeSeriesRecord(String timeSeriesResponse){		String [] timeSeriesDataArray = timeSeriesResponse.split(" ");		if(timeSeriesDataArray.length < 3) return null;						TimeSeriesRecord  record = new TimeSeriesRecord();		record.setTimestamp(Long.parseLong(timeSeriesDataArray[1]));		record.setValue(Long.parseLong(timeSeriesDataArray[2]));		return record;	}	public String getURLForMetricsData(OpenTSDBQueryParameter parameter) throws IOException{				StringBuilder sb = new StringBuilder();		sb.append(getBaseURL())		.append("/q?start=")		.append(formatDate(parameter.getStartDate()))		.append("&end=")		.append(formatDate(parameter.getEndDate()))		.append("&m=")		.append(parameter.getAggregateFunction().getAggregateFunctionValue())		.append(":")		.append(parameter.getMetricName())				.append("&ascii");				System.out.println(sb.toString());		return sb.toString();			}	private String formatDate(Date pDate){		SimpleDateFormat simpleDateFormat =  new SimpleDateFormat(DATE_FORMAT);		return simpleDateFormat.format(pDate);	}		private String getBaseURL() throws IOException{		Properties configProperties = new Properties();		configProperties.load(this.getClass().getClassLoader().getResourceAsStream("opentsdb.properties"));		return configProperties.getProperty("base.url");			}}

TestClient.java is the invoker class of the OpenTSDBClient.java file. For constructing the URL for the HTTP request we need certain parameters to be passed to OpenTSDB, such as metric name, start time, end time and aggregate function type. The members of the OpenTSDBQueryParameter need to be set before passing it as an argument to OpenTSDBClient.

private Date startDate;	private Date endDate;	private String metricName="";	private AggregateFunctionType aggregateFunction; 

TestClient.java invokes the OpenTSDB- Client.java with OpenTSDBQueryParameter as the argument.

package client;import java.util.GregorianCalendar;import com.opentsdb.client.OpenTSDBClient;import com.opentsdb.client.OpenTSDBQueryParameter;import com.opentsdb.client.OpenTSDBQueryParameter.AggregateFunctionType;import com.opentsdb.client.beans.TimeSeriesMetricVO;import com.opentsdb.client.beans.TimeSeriesRecord;public class TestClient {	public static void main(String[] args) throws Exception {						OpenTSDBQueryParameter parameter =  new OpenTSDBQueryParameter();		parameter.setStartDate(new GregorianCalendar(2013,1,8).getTime());		parameter.setEndDate(new GregorianCalendar(2013,1,12).getTime());		parameter.setMetricName("tsd.hbase.latency_50pct");		parameter.setAggregateFunction(AggregateFunctionType.SUM);				OpenTSDBClient client = new OpenTSDBClient();				TimeSeriesMetricVO metricVO = client.getTSDBMetricVO(parameter);		if(metricVO.getTsdbRecordList()!=null){			System.out.println(metricVO.getMetricName());			for(TimeSeriesRecord record:metricVO.getTsdbRecordList()){				System.out.println(record.getTimestamp()+"  "+ record.getValue());			}		}	}}

This article has addressed the challenges faced by developers during the build and set up process and also explained how to leverage OpenTSDB's HTTP APIs to develop client programs so that users can create their own user interface for charts and graphs without depending on standard features provided by OpenTSDB. We believe our proposed approach greatly improves the process of developing OpenTSDB and HBase-based applications by enhancing reusability of the code.

Kalpana C is a Technology Analyst with the ILCLOUD at Infosys Labs. She has a decade of experience in Java/J2EE, Big Data related frameworks and technologies.

Co-Author
Priyadarshi Sahoo
is a Technology Lead at Infosys Ltd. He has more than 8 years of experience in Java/J2EE related technologies.

Share the Post:
XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing

data dictionary

Tools You Need to Make a Data Dictionary

Data dictionaries are crucial for organizations of all sizes that deal with large amounts of data. they are centralized repositories of all the data in organizations, including metadata such as