Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Apache Hive Query Patterns: Generalized HiveQL Queries for Common Operations-2 : Page 2


advertisement

Apache Hive Create Pattern

In order to create a Hive table, one has to write a Create Table query specific to the table and its columns/fields in HiveQL. To avoid that, we propose a generic method that returns the needed Create Table query specific to the particular table. The same method can be used to generate a Create Table query for all the possible types of Hive tables. This generic Create Table query builder takes the HiveTable model class object as input and deliver the Create Table query as return value based on the information provided through the HiveTable model class object. Figure 2 shows the solution pattern.



Click here for larger image

Figure 2. The Create Table Query Generator Pattern



The following algorithm is the "Create Table …" HiveQL query generator solution using the HiveTable model class explained in the previous section.

Algorithm-1: Generate CreateTable Query Input: HiveTable model class object Output: String object having HiveQL Create table query. Step-1: If (Table name is NULL or TableFields object is NULL") Then 1. Report error "Unable to create table query". 2. Terminate the algorithm. end if Step-2: Start creating the Hive Create Table query by adding following components, if they are not null, into the query string. Add database and table name information. For each column in "List<TableField> tableFields" - Add field name, field data type and comment details. For each partition in "List<TablePartition> tablePartitions" - Add partition name, data type and comment details. Add row formatters if any in "RowFormat rowFormat" object. Step-3: Return the Hive Create Table query.

Apache Hive Delete Pattern

The solution pattern for Delete is illustrated in Figure 3 for the Create pattern. Here, the input HiveTable model class object will be used to create the HiveQL query to delete the table. The algorithm will be similar to Algorithm-1 described in the Create pattern, but it will create a "DROP TABLE …" HiveQL query using the information available from the input HiveTable model class object.



Click here for larger image

Figure 3. The Delete Table Query Generator Pattern

Apache Hive Select Pattern

Select (i.e. reading data from one or more tables) is one of the most heavily used operations in Hive-based applications. Generating a Select query is really complicated because the resulting Select query can contain information about:

  • one or more columns in the result set
  • one or more tables in the case of Join operations
  • different conditional clauses and other information about grouping, aggregation and ordering of result set data

Due to all these permutations, a developer has to hardcode individual Select queries for each requirement. To address this challenge, we propose a Select query builder model class, which will be provided as input to the Hive select Query builder to generate a Hive Select query. The solution pattern for Select is described in Figure 4.



Click here for larger image

Figure 4. The Select Query Generator Pattern

The following class diagram represents the Select query builder class SelectQueryBuilder, which includes the method getSelectQuery() for generating the Select query. The other one is the WhereClause model class used by the SelectQueryBuilder class.

SelectQueryBuilder

String databaseName; String tableName; String alias; List<String> resultSetColumn; List<WhereClause> whereClauses; List<String> groupByColumns; List<String> sortByColumns; int limit;

String getSelectQuery();

WhereClause

String fieldName; String fieldCriteria; Object criteriaValue; String criteriaValueType; String clauseCondition;

String getWhereClauseExpression();

The following algorithm is the "SELECT … FROM … WHERE" HiveQL query generator pattern solution using the SelectQueryBuilder class.

Algorithm-2: Generate Select_From_Where Query Step-1: If (Table name is NULL") Then 3. Report error "Unable to Select table query". 4. Terminate the algorithm. end if Step-2: Start creating the Hive Select Table query by adding following components, if they are not null, into the result query string. Add database and table name information. Add result set columns information available in List<String> resultSetColumn For each where clause in "List<WhereClause> whereClauses" - Add where clause expression into the Select query by calling getWhereClauseExpression() method on respective where clause object in the list. For each column name in "List<String> groupByColumns" - Add group by column name, into the Select query. For each column name in "List<String> sortByColumns" - Add sort by column name, into the Select query. Add limit factor available in limit variable. Step-3: Return the Hive Select Table query.

Apache Hive Insert Pattern

The way data is loaded or inserted into the Hive table is different from the common insert operation performed over the table in traditional RDBMS systems. Hive allows the user to load data directly from a file into the table instead of inserting one row at a time. The insert pattern shown in Figure 5, proposes the solution approach for generating a "LOAD DATA … INTO TABLE …" HiveQL query for loading file contents into an existing Hive table.



Click here for larger image

Figure 5. The Load Query Generator Pattern



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap