Intel Go Parallel
Intel
Getting Started Concurrent Programming Community And Opinion Tools and Tips Advanced Concepts Go Parallel RSS Feed
 Print Print

Intel Threading Building Blocks: parallel_for()
Time your code, find where concurrent processing fits in, sniff out shared data points and you're ready to roll with parallel_for(). 

Concurrency needn't be so complicated that you avoid it completely. One of the easiest ways to gain performance increases on multi-core platforms is with the parallel_for algorithm. To get a sense of how real-world developers are using Intel Threading Building Blocks, we spoke with Vincent Tan, a programmer with Pongrass Australia Pty. Ltd. in Bondi Junction, New South Wales, Australia.

As described on the Intel Software Network, Tan created multithreaded version of par2cmdline 0.4, a utility commonly used to repair corrupted Usenet postings via Reed Solomon coding. By leveraging the Intel Threading Building Blocks 2.0 library (using TBB's mutex, concurrent_hash_map, atomic, and parallel_for constructs), the program can process files concurrently instead of serially. As a result, dual-core machines can nearly double performance time when creating or repairing data files.

How did you learn about parallel_for?

I read the Intel TBB tutorial and reference manuals. From there, I looked at the sample code.

Was it easy to add the algorithm to your application?

After studying the sample code, it was straightforward to convert the code. The harder part was finding all of the shared resources (such as member variables) and then ensuring that access to them was thread-safe.

Did you make any mistakes?

I originally specified a grain size, but I found that it did not really help (because the TBB's default behavior was good enough for the code to which I tried to apply the grain size).

What would be the most interesting use for this algorithm?

To be honest, I view it as a tool to solve a particular problem. The obvious for loops in the project's code pretty much dictated the use of parallel_for. I'll put it another way: If you can process elements of a random-accessible array in parallel (i.e., the elements have no interdependencies) then parallel_for is the tool you probably want.

What performance or productivity benefits did you gain?

CPU utilization on a dual-core machine went from ~40-45 percent to ~80-85 percent. Because I/O is still performed serially (non-overlapped), the code never achieves 100 percent utilization—but a doubling of performance is good enough for most users.

How should a developer get started with parallel_for?

Read the Intel TBB tutorial on the Documentation page of threadingbuildingblocks.org and study the sample code. The reference manual helps out with the nitty-gritty details but you'll probably only need it if you need to specify the grain size.

TBB Code Listing
Here's a snippet of parallel_for at work in the par2cmdline source code.

(Key to colors used here)
Original Code
Note
TBB Class or Function
Boilerplate Code
The For Loop:

Helper functions:

// par2creator.cpp::973
// New function to hold the original loop body
void ProcessData(u32 outputblk, u32 endindex, size_t blklength, u32 inputblk) {
  for( ; outputblk != endindex; ++outputblk ) {
    // Select the appropriate part of the output buffer
    void *outbuf = &((u8*)outputbuf)[chunksize * outputblk];

    // Process the data through the RS matrix
    rs.Process(blklength, inputblk, inputbuf, outputblk, outbuf);
  }
}

// Encapsulates the loop body
class ApplyRSProcess {
public:
  ApplyRSProcess(Par2Creator* obj, size_t blklength, u32 inputblk) :
    _obj(obj), _blklength(blklength), _inputblk(inputblk) {}
  void operator()(const tbb::blked_range<u32>& r) const {
    _obj->ProcessData(r.begin(), r.end(), _blklength, _inputblk);
  }
private:
  Par2Creator* _obj;
  size_t       _blklength;
  u32          _inputblk;
};
   
An award-winning magazine writer and the former editor in chief of Software Development, Alexandra Weber Morales is also a Webmaster, singer-songwriter, and recovering auto mechanic.
Submit article to:
Ever wonder why we don't hear more from threading practitioners about how they managed to grok concurrency? Perhaps it's because they're too busy enjoying the performance increases. They won't say it's easy, but the Vegas Pro developers at Sony Creative Software are understandably proud of their growing expertise in threading and OpenMP. »
While threading can be a challenge, new software development tools help simplify the process by identifying thread correctness issues and performance opportunities. We present a methodology that has been used to successfully thread many applications and discuss tools that can assist in developing multi-threaded applications. »
This paper describes the performance analysis phase of the threading methodology we presented in our previous paper, "Best Practices for Developing and Optimizing Threaded Applications." »
Understanding Dual Processors, Hyper-Threading Technology, and Multi-Core Systems
Multi-Threading in a Java Environment
» More Personalized Content
Getting Started (98)
Concurrent Programming (114)
Community and Opinion (52)
Tools and Tips (90)
Advanced Concepts (62)
What concurrency info do you need right now?
(Choose your top answer.)
An introduction
Threading basics
Advanced parallelism concepts
Optimization tools and techniques

View Results
Past Votes