Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX

By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.


Implement Parallel Processing in Your Java Applications : Page 5

How do you as a Java developer adapt your applications to the multi-core and parallel computing trends? A new Java framework can help you build parallel applications quickly.




Application Security Testing: An Integral Part of DevOps

The DataRush Libraries and Tool Support
DataRush (the libraries, tools, APIs, and the engine itself) is available as a set of JAR files. In fact, this is how DataRush supports application embedding. It also allows you to easily package and deploy DataRush applications to servers in your production environment. Supported environments include Windows, Solaris (x86 and SPARC), Linux, HP-UX, and IBM AIX. (Author Note: Although it's not officially supported, DataRush installed and ran perfectly well on my Intel-based Mac as well.)

DataRush includes a library of precompiled operators that you can use when creating your own operators and assemblies. Operators exist to perform data reads and writes on flat files, XML, and relational databases, along with generic logic processing. These operators serve as building blocks for you to reuse, and they reduce the need to implement these common tasks.

Click to enlarge
Figure 9. DataRush Integrates with Eclipse

In terms of application development, DataRush comes with an Eclipse plug-in that works with the Eclipse IDE as well as the Eclipse Graphical Editing Framework (GFE). The end result is support for DataRush-specific projects with visual modeling and editing of parallel processing tasks (see Figure 9).

You can build, run, and test outside of Eclipse as well, as DataRush integrates with command-line tools such as Ant and unit-test frameworks such as JUnit. DataRush-specific Ant tasks to build and test custom DataRush applications are included in a JAR file. This allows you to automate all aspects of your application builds, including the execution of DataRush test suites.

A Sample DataRush Application
DataRush comes with multiple sample applications to get you started. This section discusses the "New Fields" sample application, as it contains both XML and Java operators. The application loads data from an input file (specified as a property) and reads three fields that make up a simulated sale: a date, a dollar amount, and a product ID. This portion of the processing is completely specified in the DFXML assemblies. The Java portion is an operator that computes the day of the week based upon a date as input. The output of this operator is combined with the sales data, along with a new record identifier generated at processing time, and then written to an output file.

Listing 3 contains the complete assembly definition DFXML file for this application. At the top of the file, properties are defined that control overall processing. Some of the important properties are:

  • inputFileName: the name and path to read the incomplete sales data
  • outputFileName: the name and path to write the completed sales data (with fields added)
  • startRowID: a starting identifier for new rows written to the output file
  • fieldSeparator: the delimiter character or string used as a separator in the input file

The next section in the assembly specification describes the individual assembly operators and one process for the Java object. Some of these are:

  • read (operator): uses the ReadDelimitedText operator in the DataRush operator library to read the input file
  • genRowID (operator): uses the GeneratedArithmeticSequence operator to generate unique output row identifiers
  • dayOfWeek (process and operator): defines a custom operator, implemented as a Java class, that specifies a DATE as its input and an integer as its output, which is linked to the read operator's input

The remaining sections of the specification link operator output and input ports, thereby defining a complete application dataflow.

Build the Sample Application
Building the application is a two-step process, but you can combine both steps via an Ant script or an Eclipse project. First, from within the DataRush samples directory, compile the Java code with the following command:

> javac -d build/classes -classpath ../dfre/lib/dfreapi.jar src/example/newfields/DayOfWeekProcess.java

Next, run the DataRush Assembler on the assembly specification:

> dfa -d build/classes -sp src src/example/newfields/NewFieldsTextFile.df.xml

Although the specification is split across two .df.xml files, both are assembled into binaries because one references the other.

Run the Sample Application
After a successful application build, you can run the application with the DataRush Engine. You must include a properties file that specifies the path to the input and output files (included with the sample). This is done with the following command:

> dfe -cp build/classes -pf newfields.properties example.newfields.NewFieldsTextFile

When executed, the sample application will write its data to the output file, NewFieldsSampleOutput.txt (see Listing 4).

The Results
To experience firsthand the benefits of using DataRush and the pipeline parallelism it employs, I ran this sample on both a 1.83 GHz Core Duo processor (dual-core) system and a single core (but faster) 3GHz Pentium 4 system. The application completed in 2.011 seconds on the dual-core machine, while it took 5.1 seconds on the Pentium 4. I was impressed by how even a simple application gained much higher throughput on a multi-core machine.

Parallel Computing Comes to the Development Process
Just as multi-core computing has brought affordable parallel-processing computers to the fore, Pervasive DataRush has brought parallel computing to the Java community in a comprehensive, easy-to-use package. DataRush solves the complex problems associated with developing applications that utilize multi-CPU, symmetrical multi-processing systems, and it comes in a reusable form that you can leverage in all your applications without rewriting all your code.

Eric Bruno is a New York-based consultant who has built high-volume Web applications, database systems, and real-time transactional systems in Java and C++. Visit www.ericbruno.com for more about him.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date