Apache Hive Create Pattern
In order to create a Hive table, one has to write a Create Table query specific to the table and its columns/fields in HiveQL. To avoid that, we propose a generic method that returns the needed Create Table query specific to the particular table. The same method can be used to generate a Create Table query for all the possible types of Hive tables. This generic Create Table query builder takes the HiveTable model class object as input and deliver the Create Table query as return value based on the information provided through the HiveTable model class object. Figure 2 shows the solution pattern.
Click here for larger image
Figure 2. The Create Table Query Generator Pattern
The following algorithm is the "Create Table …" HiveQL query generator solution using the HiveTable model class explained in the previous section.
Algorithm-1: Generate CreateTable Query
Input: HiveTable model class object
Output: String object having HiveQL Create table query.
Step-1: If (Table name is NULL or TableFields object is NULL")
Then
1. Report error "Unable to create table query".
2. Terminate the algorithm.
end if
Step-2: Start creating the Hive Create Table query by adding following components, if they are not null, into the query string.
Add database and table name information.
For each column in "List<TableField> tableFields"
- Add field name, field data type and comment details.
For each partition in "List<TablePartition> tablePartitions"
- Add partition name, data type and comment details.
Add row formatters if any in "RowFormat rowFormat" object.
Step-3: Return the Hive Create Table query.
Apache Hive Delete Pattern
The solution pattern for Delete is illustrated in Figure 3 for the Create pattern. Here, the input HiveTable model class object will be used to create the HiveQL query to delete the table. The algorithm will be similar to Algorithm-1 described in the Create pattern, but it will create a "DROP TABLE …" HiveQL query using the information available from the input HiveTable model class object.
Click here for larger image
Figure 3. The Delete Table Query Generator Pattern
Apache Hive Select Pattern
Select (i.e. reading data from one or more tables) is one of the most heavily used operations in Hive-based applications. Generating a Select query is really complicated because the resulting Select query can contain information about:
- one or more columns in the result set
- one or more tables in the case of Join operations
- different conditional clauses and other information about grouping, aggregation and ordering of result set data
Due to all these permutations, a developer has to hardcode individual Select queries for each requirement. To address this challenge, we propose a Select query builder model class, which will be provided as input to the Hive select Query builder to generate a Hive Select query. The solution pattern for Select is described in Figure 4.
Click here for larger image
Figure 4. The Select Query Generator Pattern
The following class diagram represents the Select query builder class SelectQueryBuilder, which includes the method getSelectQuery()
for generating the Select query. The other one is the WhereClause model class used by the SelectQueryBuilder class.
SelectQueryBuilder |
String databaseName;
String tableName;
String alias;
List<String> resultSetColumn;
List<WhereClause> whereClauses;
List<String> groupByColumns;
List<String> sortByColumns;
int limit; |
|
WhereClause |
String fieldName;
String fieldCriteria;
Object criteriaValue;
String criteriaValueType;
String clauseCondition; |
String getWhereClauseExpression(); |
The following algorithm is the "SELECT … FROM … WHERE" HiveQL query generator pattern solution using the SelectQueryBuilder class.
Algorithm-2: Generate Select_From_Where Query
Step-1: If (Table name is NULL")
Then
3. Report error "Unable to Select table query".
4. Terminate the algorithm.
end if
Step-2: Start creating the Hive Select Table query by adding following components, if they are not null, into the result query string.
Add database and table name information.
Add result set columns information available in List<String> resultSetColumn
For each where clause in "List<WhereClause> whereClauses"
- Add where clause expression into the Select query by calling getWhereClauseExpression() method on respective where clause object in the list.
For each column name in "List<String> groupByColumns"
- Add group by column name, into the Select query.
For each column name in "List<String> sortByColumns"
- Add sort by column name, into the Select query.
Add limit factor available in limit variable.
Step-3: Return the Hive Select Table query.
Apache Hive Insert Pattern
The way data is loaded or inserted into the Hive table is different from the common insert operation performed over the table in traditional RDBMS systems. Hive allows the user to load data directly from a file into the table instead of inserting one row at a time. The insert pattern shown in Figure 5, proposes the solution approach for generating a "LOAD DATA … INTO TABLE …" HiveQL query for loading file contents into an existing Hive table.
Click here for larger image
Figure 5. The Load Query Generator Pattern