RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Hive and Hadoop for Data Analytics on Large Web Logs-3 : Page 3

Developers can use Apache Hive and Hadoop for data analytics on large web logs, analyzing users' browsing patterns and behavior.


Hadoop Hive Data Load

Hive provides tools to enable easy data ETL, a mechanism to put structures on the data, and defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, Hive QL also allows programmers familiar with MapReduce to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Below is the create table command for Hive.

create table weblogs (client_ip    string,full_request_date string,day    string, month    string, 
month_num int, year string, hour string, minute string, second string,timezone string,http_verb string,
uri string, http_status_code string,bytes_returned string,referrer string,user_agent string)
row format delimited fields terminated by '\t' stored as textfile

The below command is used for loading data from an HDFS location to a Hive table.

LOAD DATA INPATH '<HDFS file path of parsed file>' INTO TABLE <table name>  

After loading the data into the table, the normal user can query Hive using Hive QL. Below is an example query for getting user counts of each location.

SELECT client_ip  , COUNT(client_ip) FROM weblogs GROUP BY client_ip 

Hadoop Hive JDBC Support

Hive also supports JDBC connections. To connect Hive with JDBC, you need to start the Hive Thrift Server as follows.

Export HIVE_PORT=9999 
     Hive –service hiveserver 

Here are the steps to establish a Hive JDBC connection:

    1. Add hive-jdbc0.7.jar in the classpath; this is a type-4 driver.
    2. Use the org.apache.hadoop.hive.jdbc.HiveDriver driver for the connection.
    3. Connection String : jdbc:hive://<hive IP Address>:<hive port>/<database name>
    4. Use Hive QL to query the table in Hive, and it will return the result set.
    5. Using the result set, you can project the output in graphs or charts easily.
    6. Here is a sample Hive JDBC program:

    Connection con = DriverManager.getConnection("jdbc:hive://", "", "");
    Statement stmt = con.createStatement();
    Resultset res = stmt.executeQuery("SELECT client_ip  , COUNT(client_ip) FROM weblogs GROUP BY  client_ip");
    while (res.next()) {
      System.out.println(res.getInt(1) + "\t" + res.getString(2));

Ramasubramanian Thiyagarajan is a senior software engineer involved in the design and development of Java EE applications.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date