Forking and Joining Java to Maximize Multicore Power

f writing software is hard, then writing applications that will take full advantage of the ever-more-prevalent multicore/multi-CPU hardware?without introducing subtle race conditions and consistency errors?is even harder. Numerous solutions have been floated for Java as well as other environments, solutions such as software transactional memory API libraries, functional languages, and even entirely new environments. However, no solution has emerged as the clear winner or even clear front-runner yet.

For its part, Sun has offered language features for structuring concurrency applications. Java 5 introduced a slew of concurrency data structures as well as behaviors and locking tools as part of the java.util.concurrent package, and Java 6 added a few more thread-safe data structures. Now Sun is set to release the first “new” approach to concurrency since the release of the Thread API in Java 1.0. Based on the work Doug Lea began back in 2000 to port a “work-stealing” mechanism to Java, this new API is known officially as JSR-166y (to distinguish it from the original JSR-166 for Java 5). To the initiated, it is colloquially called the fork/join framework.

When facing a certain type of complex programming problem, a traditional approach has been to split the solution code up into pieces and then spin each piece off into its own thread. However, this divide-and-conquer approach presents developers with one of two problems:

  1. The individual pieces are too large. They simply end up forcing other threads to wait while one or two of them finish their computations.
  2. The pieces are too little. The cost of spinning up the thread itself overwhelms the actual work done, and no benefit is gained.

Getting the granularity just right often leads developers to shout, “Ah, to heck with it!” At which point, they dump all the work in a single thread and don’t bother trying to fully optimize it. While this works well during development when the data sets or required execution loads are light, the code eventually bogs down in production as the system tries to scale up. The race condition here is obvious: Will the system need to scale up before the developer can get his resume out to other employers?

Lea found that the divide-and-conquer approach often resulted in premature optimization. The essence of his fork/join approach is to take full advantage of the cores/CPUs available by splitting the work up into a series of independent tasks, forking each one off to a separate processing agent, and then joining the results after all those tasks are done. As presented in Lea’s paper, the pseudo-code involved looks something like this:

Result solve(Problem problem) {  if (problem is small)    directly solve problem  else {    split problem into independent parts    fork new subtasks to solve each part    join all subtasks    compose result from subresults  }}

In other words, one key benefit of the fork/join framework is that it will decide?at the point of solving the problem?whether or not to continue recursively splitting the task into smaller tasks.

Note the use of the term “processing agent” above, instead of “thread.” The basic problem mentioned earlier?the cost of spinning up a thread could often outweigh the benefit served by spinning off the thread in the first place?still holds true. For that reason, several tasks may in fact be executed by a single thread in a sequential manner. The best part is this is a tunable parameter, and the developer doesn’t need to worry about how and where that sequential decision-making takes place.

Getting Started: The Domain Problem

Using the fork/join framework happens at several different levels, but in many respects the easiest way to get started with the framework is with the ParallelArray class. When to use the fork/join framework obviously is a decision best made only with the full context of the situation at hand, but in general consider fork/join for searching, sorting, summarizing, and filtering data sets. For example, consider a simple student grading system, using a Student domain class similar to the one in Listing 1.

Observant readers will notice that Listing 1 uses public fields. In production code, you obviously would replace it with the appropriate bean methods, but the class fulfils the pedagogical purposes of this example just fine (Click here to download the complete source code).

The domain problem in Listing 1 is simple: a small college/university has a number of students, all of which the system needs to be able to sort, select, and display. (Of course the problem is tailor made for the fork/join framework?demos have that tendency.) To begin, you first need to generate a number of random students. Use a helper method to create a number of students in an array:

import jsr166y.forkjoin.*;public class ForkJoin{  private static Student[] generateStudents(int classSize)  { /* ,,, */ }  public static void main(String[] args)  {    assert (args.length < 1);    int classSize = new Integer(args[0]);        assert (classSize > 0);        Student[] currentClass = generateStudents(classSize);    System.out.println("Random seed data:");    for (Student s : currentClass)    {      System.out.println("	" + s);    }    System.out.println("");    // . . .}

The actual details of the generateStudents() method is unimportant for this demo. It simply generates a bunch of Student objects.

Moving On: Applying the Fork/Join Framework

The first step to using the fork/join framework is to create a ParallelArray instance around the data in question (the student array). To do so, the fork/join framework requires you to hook into the java.util.concurrent threading classes. Specifically, you need to create an Executor, and fortunately, the fork/join framework already provides one:

    // . . .    ForkJoinExecutor fjPool =       new ForkJoinPool(25); // use 25 threads    ParallelArray processor =      ParallelArray.createUsingHandoff(currentClass, fjPool);    // . . .

At this point, you have a ParallelArray and you can start fork/joining.

The first task is to rip through the array and list all of the students by their student ID number. Traditional Java would have you rip through the array item by item, sorting the current item into a results collection. Alternatively, you could use the Collections.sort() method to do the sorting, but that requires a Comparator-implementing class to do the actual comparison. As it turns out, writing that Comparator is pretty simple, and ParallelArray takes a Comparator to its sort() method:

    // . . .    System.out.println("Sorted by student ID:");    processor      .sort(studentIDComparator)    System.out.println("");    // . . .  }  private static final Comparator studentIDComparator =    new Comparator()    {      public int compare(Student lhs, Student rhs)      {        if (lhs.studentID < rhs.studentID)          return -1;        else if (lhs.studentID > rhs.studentID)          return 1;        else          return 0;      }    };

Notice how the studentIDComparator, rather than being passed in as an anonymous inner class instance right inline, is stored as a final reference in the class. This is a common pattern when working with the fork/join framework.

The results may be sorted at this point, but how can you tell? You need to display the results to the console. You could try to extract the results from the object returned from sort(), but there’s an easier way. The ParallelArray class can take another Comparator-like object, again, either an anonymous inner class instance or stored final reference of the Ops.Procedure type, and call it on each element in the returned array:

    // . . .    System.out.println("Sorted by student ID:");    processor      .sort(studentIDComparator)      .apply(printOutStudent);    System.out.println("");    // . . .  }  private static final Ops.Procedure printOutStudent =    new Ops.Procedure()    {      public void op(Student s)      {        System.out.println("	" + s);      }    };

Wrapping Up: Using the ParallelArray API

By now, a couple of things about the ParallelArray API design should become more obvious. First, the ParallelArray class and its kin are written in a fluent API style. That means you can string operations together simply by chaining method calls, where each successive method call is operating on the returned collection from the prior call. So the returned collection from sort() is the set to which apply() calls the Ops.Procedure instance’s op() method.

Second, the original contents of the ParallelArray are never modified. This is a huge advantage as it lessens any concerns about thread safety. That is, you don’t have to worry about synchronizing access to the ParallelArray because generally you need to set up locks only when modifying the contents, not reading them. (This same principle is why strings are immutable.)

Continuing with the demo, you can do various things with the ParallelArray. For example, you can determine which students are supposed to be graduating this year and display them. This is a filter operation that requires an Ops.Predicate instance, which this time returns a Boolean indicating whether or not the object in question should be included in the results:

    // . . .    System.out.println("Should graduate this year:");    processor      .withFilter(isSenior)      .apply(printOutStudent);    System.out.println("");    // . . .  }  private static final Ops.Predicate isSenior =    new Ops.Predicate()    {      public boolean op(Student s)      {        return s.graduationYear == 2009; // this year      }    };

The initial criterion is just to find out whether the students are supposed to graduate this year, but remember that not all who attend college graduate college. Students have to attain a certain grade point average in order to earn those coveted diplomas. One solution would be to write a new Ops.Predicate that takes both criteria into account, but that’s going to lead to an explosion of predicates if you’re not careful. The chaining-based approach of the ParallelArray API allows for a better solution (see Listing 2).

This particular university, like most universities, is particularly hard on athletes to make sure they’re not just there to look good on TV. (Yes, that’s my little revenge on all those guys who graduated from my college with 0.5 GPAs because they threw 30 touchdowns and won the national championships. Not that I’m still bitter or anything.)

Following Up: More to Learn

A cursory glance at the JSR-166y Javadocs page will reveal that it contains a lot more APIs and classes, but you have gotten the main highlights of the library. Don’t be too intimidated by the large number of Ops.* classes; most of those go away with the generic versions (Ops.Predicate, Ops.Procedure, Ops.Reducer, and so on), and as a result, the rest aren’t really worth studying until you need them.

You will also notice that the fork/join framework looks a bit different from the traditional Java API. The fork/join framework, like many of the current concurrency proposals, takes a more functional approach to its design, in the spirit of such functional languages as Haskell, OCaml, Lisp, F#, or Scala. That will require you to adjust your thinking a bit in order to use it effectively. Because Java doesn’t consider functions/methods to be first-class objects the way those other languages do, you have to adapt slightly by creating functors (function objects). Functors are the anonymous inner-class instances that you pass into the API. This isn’t an indication that Java is going to embrace functional programming, but it is another design style that can yield some interesting and powerful benefits to those who take the time to learn it.

In the meantime, the fork/join framework is still under active development and its APIs are still subject to change all the way up until the JSR is finalized and the Java 7 implementation ships. Chances are, however, that the APIs you see here are, for the most part, the way they will remain. Certainly, the concepts?passing in function objects and the fluent API design?are pretty firmly nailed down.

So don’t be afraid to explore the fork/join framework, because nothing?not bugs or even concurrent bugs?is worse than letting CPU power go idle while you unwrap loops and hand-check thread-safe code.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Related Posts