dcsimg
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Plan for the Future: Express Parallelism, Don't Manage It  : Page 3

When adding parallelism, the key design choice is to express concurrency in an application without explicitly managing the scheduling of that concurrency onto the hardware.


advertisement

Parallelizing assign_fitness

After creating each generation, the sample application must assign a fitness value to each individual in that new generation. The serial implementation of assign_fitness in serial_ga.cpp uses std::for_each to iterate through the new children in the my_individuals vector to assign fitness values to them:

   inline void 
   population::assign_fitness() {
     std::for_each( my_individuals.begin() + population_size, 
                    my_individuals.end(), set_fitness() );
   }
The process to implement a natively threaded version of assign_fitness is similar to that you saw in the preceding section to implement the threaded version of generate_children, so you'll see a condensed version here. First, add code to create and join a set of native threads:

   inline void 
   population::assign_fittness() {
     for ( int t = 0; t < num_threads; ++t ) {
       handles[t] = _beginthread( &start_set_fitness, 0, (void *)t );
     }
     WaitForMultipleObjects( num_threads, (HANDLE *)handles, 
                             true, INFINITE );
   }
Next, package the loop so you can pass it through the Windows threading API:

   inline void start_set_fitness( void *x ) {
      population_helper::set_fitness( int(x) );
   }
Finally, modify the loop code to use the same adjust_begin_end scheduling routine as generate_children:

   static void 
   population_helper::set_fitness( const int thread_id ) {
     size_t begin = population_size;
     size_t end = my_individuals->size();
     adjust_begin_end( thread_id, begin, end );
     for ( size_t i = begin; i < end; ++i ) {
       (*my_individuals)[i].set_fitness();
     }
   }
The TBB implementation simply replaces std::for_each with tbb::parallel_for by using a blocked_range of iterators:

   inline void assign_fitness() {
     tbb::parallel_for( tbb::blocked_range<vector_type::iterator>(  
               my_individuals.begin() + population_size, 
               my_individuals.end() ), 
               set_fitness_body(),
               tbb::auto_partitioner() );
   }
Implementing set_fitness_body is straightforward:

   struct set_fitness_body {
      void operator() (const tbb::blocked_range < 
      vector_type::iterator > 
      &range ) const {
         for ( vector_type::iterator i = range.begin(); 
            i != range.end(); ++i) { 
            i->set_fitness();
         }
      }
   };
Again, the natively threaded code is more difficult to write, uses a naive scheduling policy and is tied to the use of num_threads. Of course it's possible to write a set of advanced support routines using native threads to do a better job of managing the concurrency—but if you did that, you'd be writing a concurrency platform instead of focusing on the application's features.



Thanks for your registration, follow us on our social networks to keep up-to-date