Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Processing Large Datasets in a Grid Environment : Page 2

Using LZO compression and a protocol buffer can improve the speed and scalability of your large dataset processing.


advertisement

Putting It All Together: LZO Compression Use Case

This section walks through the code for an LZO compression use case of our solution. This sample use case program provides parallel processing in a single-node system using large dataset files. (Click here to download the source code.)

The following employee proto file is used for the proto object.



option java_package = "com.emp"; option java_outer_classname = "EmpProtos"; message Employee { required string name = 1; required int32 id = 2; // Unique ID number for this employee optional string email = 3; } message Emprecord { repeated Employee emp = 1; }

The following code snippet uses a proto Java file for object management, and it writes the employee object into the output file as a serialized form.

package com.emp; import java.io.*; import com.emp.EmpProtos.*; //importing the Java proto which is created by proto compiler. //writing the employee record into the output file public static void addEmployee(int id ,String name,String email, FileOutputStream fout) throws Exception { Emprecord.Builder recBuilder = Emprecord.newBuilder(); Employee.Builder empBuilder = Employee.newBuilder(); empBuilder.setName(name); empBuilder.setId(id); empBuilder.setEmail(email); recBuilder.addEmp(empBuilder.build()); recBuilder.build().writeTo(fout); }

The following code snippet helps read the employee record in the input file.

//Reading the employee record from the file public static void readEmprecord(FileInputStream fin) throws Exception { Emprecord empList =Emprecord.parseFrom(fin); System.out.println("==Employee List =="); if(empList != null){ for (Employee emp: empList.getEmpList()) { System.out.println(emp.getId()+" "+emp.getName()+" "+getEmail()); } } }

The following code snippet is for compressing the block of the proto object file. We use the lzocomdecomp.jar to compress and decompress the proto object block file. The code shows the syntax for calling the LZO compression class. This class will return the .(dot)LZO extn file, which means it compressed via LZO compression.

LzoCompress.compress( <proto object block file path>)

The following code snippet is for uncompressing the compressed proto object block file. We create an object for the class and pass the value as a compressed proto object file. This class supports the threading and you can customize it based on your proto object for reading from the block. The output of the file extension is .(dot)unlzo.

FileUncompress( <Compressed proto object file path>)

The Required JARs

The following are the supported JAR files required for executing the sample use case. Click here to download the source code.

  • protobuf-java-<version of compiler>.jar
  • protobuf-format-java-1.1.jar
  • lzocomdecomp.jar: This is our own JAR for LZO compression and decompression.
  • Sampleusecase_emp.jar: This JAR contains multithreading way of processing large dataset.

Here is the syntax for executing the sample use program.

Empaddread <dataset infilepath> <outfile path> <blocksize in bytes>

The input file is comma separated in this format:

<id>,<name>,<email>

Here is the field format:

  • Id: integer
  • name: string
  • email: string


Sivakumar Kuppusamy is a product technical lead involved in the design and development of Java EE applications.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap