Intel Go Parallel
Intel
Getting Started Concurrent Programming Community And Opinion Tools and Tips Advanced Concepts Go Parallel RSS Feed
 Print Print

Intel Threading Building Blocks: The Pipeline Class
Applications that are ripe for pipelining are file compression by directory and performing complex convolutions on video frames. 

In our first installment of the Featured Algorithm series, we looked at parallel_for, a popular and easy way to start using Intel Threading Building Blocks to gain performance increases on multi-core platforms. Next, we focus on the pipeline component. A pipeline functions like a factory assembly line, running a stream of inputs through a series of filters.

To get a sense of how real-world developers are using Intel Threading Building Blocks, we spoke with Richard Bowler, CTO of Aeshen LLC, based in the greater Portland, Oregon area. His PackRat application, written in C++, uses multithreading via a TBB pipeline to speed up file compression.

Was it easy to add the pipeline class to your application?

It was very easy to add the required pipeline classes to the application. In fact, of all the TBB mechanisms I used over several sample applications, the pipeline mechanism was the easiest to implement as a neophyte with TBB. (Let me stress that all the TBB mechanisms I used worked very well and went in with a minimum of fuss, but pipeline was the easiest to implement of the bunch.)

Did you make any mistakes?

Actually, the pipeline code I added worked the first time. This is surprising when you think about it. I remember when I was using TBB for doing some loop parallelization, it took a couple of tries before I got the kinks out. But the pipeline was easy to understand and went in without a hitch.

What would be the most interesting use for this algorithm?

This is a great feature to use when you have to move large amounts of data through multi-step algorithms. Anytime you can separate an algorithm into discrete steps, you can break those steps up and parallelize the algorithm using pipeline. The setup is straightforward. Basically, you code the steps, and define intermediate data that gets passed from each step to the next, and at the end, your results roll out. Good examples of algorithms that are ripe for pipelining are file compression by directory (which I did in my example), and performing a complex convolution on a video frame.

What performance or productivity benefits did you gain?

I noticed real performance gains on my single-core computer setup to do hyperthreading. Given that what you're doing underneath is spawning concurrent threads to divide up work, the gains grow significantly when you move to multi-core systems.

How should a developer get started with pipeline?

Presuming you have an algorithm that fits the piping model, I'd say just dive in. There really isn't a huge learning curve. I went from zero to implemented in about four hours. It's a snap!

TBB Code Listing
Here are the pipeline runs in the Aeshen PackRat source code.

(Key to colors used here)
Original Code
Note
TBB Class or Function
Boilerplate Code

This example was drawn from the Aeshen PackRat program.

Use of the TBB pipeline:
// create the filter objects for the pipeline
BlockCompress compressor;

// create the pipeline and insert filter
tbb::pipeline pipeline;
pipeline.add_filter(compressor);

// run the pipeline
// N is the maximal number of data pieces a pipeline could process at one time
pipeline.run(N); 
// clear the pipeline before destruction
pipeline.clear();


An example of a filter: 
class BlockCompress : public tbb::filter
{
public:
  BlockCompress(void) : tbb::filter(/*is serial step?*/ false) {}
  ~BlockCompress(void);
  // override the () operator, as required for use in a TBB pipeline
  void* operator() (void* item)
  {
    PRFileBlock* pBlock = (PRFileBlock*) item;
    char buffer[PR_BLOCK_READ_SIZE+1024];
    pBlock->nCompressedSize = PR_BLOCK_READ_SIZE+1024;
    int nCompressResult = BZ2_bzBuffToBuffCompress( buffer,
            &pBlock->nCompressedSize, pBlock->buffer,
            pBlock->nUncompressedSize, 5, 0, 30);
    if (nCompressResult == BZ_MEM_ERROR) return NULL;
    ASSERT(nCompressResult == BZ_OK);
    memcpy_s(pBlock->buffer, PR_BLOCK_READ_SIZE+1024, buffer,
            pBlock->nCompressedSize);
    return pBlock;
  }
};

Page 1 of 1
An award-winning magazine writer and the former editor in chief of Software Development, Alexandra Weber Morales is also a Webmaster, singer-songwriter, and recovering auto mechanic.
Submit article to:
Ever wonder why we don't hear more from threading practitioners about how they managed to grok concurrency? Perhaps it's because they're too busy enjoying the performance increases. They won't say it's easy, but the Vegas Pro developers at Sony Creative Software are understandably proud of their growing expertise in threading and OpenMP. »
While threading can be a challenge, new software development tools help simplify the process by identifying thread correctness issues and performance opportunities. We present a methodology that has been used to successfully thread many applications and discuss tools that can assist in developing multi-threaded applications. »
This paper describes the performance analysis phase of the threading methodology we presented in our previous paper, "Best Practices for Developing and Optimizing Threaded Applications." »
How Can Theory of Constraints Help in Software Optimization?
Performance Scaling in the Multi-Core Era
» More Personalized Content
Getting Started (90)
Concurrent Programming (105)
Community and Opinion (48)
Tools and Tips (85)
Advanced Concepts (58)
What concurrency info do you need right now?
(Choose your top answer.)
An introduction
Threading basics
Advanced parallelism concepts
Optimization tools and techniques

View Results
Past Votes