Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Hive and Hadoop for Data Analytics on Large Web Logs-3 : Page 3


advertisement

Hadoop Hive Data Load

Hive provides tools to enable easy data ETL, a mechanism to put structures on the data, and defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, Hive QL also allows programmers familiar with MapReduce to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Below is the create table command for Hive.



create table weblogs (client_ip string,full_request_date string,day string, month string,
month_num int, year string, hour string, minute string, second string,timezone string,http_verb string,
uri string, http_status_code string,bytes_returned string,referrer string,user_agent string)
row format delimited fields terminated by '\t' stored as textfile

The below command is used for loading data from an HDFS location to a Hive table.

LOAD DATA INPATH '<HDFS file path of parsed file>' INTO TABLE <table name>

After loading the data into the table, the normal user can query Hive using Hive QL. Below is an example query for getting user counts of each location.

SELECT client_ip , COUNT(client_ip) FROM weblogs GROUP BY client_ip

Hadoop Hive JDBC Support

Hive also supports JDBC connections. To connect Hive with JDBC, you need to start the Hive Thrift Server as follows.

Export HIVE_PORT=9999 Hive –service hiveserver

Here are the steps to establish a Hive JDBC connection:

  1. Add hive-jdbc0.7.jar in the classpath; this is a type-4 driver.
  2. Use the org.apache.hadoop.hive.jdbc.HiveDriver driver for the connection.
  3. Connection String : jdbc:hive://<hive IP Address>:<hive port>/<database name>
  4. Use Hive QL to query the table in Hive, and it will return the result set.
  5. Using the result set, you can project the output in graphs or charts easily.

Here is a sample Hive JDBC program:

Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver"); Connection con = DriverManager.getConnection("jdbc:hive://10.0.0.1:9999/default", "", ""); Statement stmt = con.createStatement(); Resultset res = stmt.executeQuery("SELECT client_ip , COUNT(client_ip) FROM weblogs GROUP BY client_ip"); while (res.next()) { System.out.println(res.getInt(1) + "\t" + res.getString(2)); }



Ramasubramanian Thiyagarajan is a senior software engineer involved in the design and development of Java EE applications.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap