Parallelizing assign_fitness
After creating each generation, the sample application must assign a fitness value to each individual in that new generation. The serial implementation of
assign_fitness in
serial_ga.cpp uses
std::for_each to iterate through the new children in the
my_individuals vector to assign fitness values to them:
inline void
population::assign_fitness() {
std::for_each( my_individuals.begin() + population_size,
my_individuals.end(), set_fitness() );
}
The process to implement a natively threaded version of
assign_fitness is similar to that you saw in the preceding section to implement the threaded version of
generate_children, so you'll see a condensed version here. First, add code to create and join a set of native threads:
inline void
population::assign_fittness() {
for ( int t = 0; t < num_threads; ++t ) {
handles[t] = _beginthread( &start_set_fitness, 0, (void *)t );
}
WaitForMultipleObjects( num_threads, (HANDLE *)handles,
true, INFINITE );
}
Next, package the loop so you can pass it through the Windows threading API:
inline void start_set_fitness( void *x ) {
population_helper::set_fitness( int(x) );
}
Finally, modify the loop code to use the same
adjust_begin_end scheduling routine as
generate_children:
static void
population_helper::set_fitness( const int thread_id ) {
size_t begin = population_size;
size_t end = my_individuals->size();
adjust_begin_end( thread_id, begin, end );
for ( size_t i = begin; i < end; ++i ) {
(*my_individuals)[i].set_fitness();
}
}
The TBB implementation simply replaces
std::for_each with
tbb::parallel_for by using a
blocked_range of iterators:
inline void assign_fitness() {
tbb::parallel_for( tbb::blocked_range<vector_type::iterator>(
my_individuals.begin() + population_size,
my_individuals.end() ),
set_fitness_body(),
tbb::auto_partitioner() );
}
Implementing
set_fitness_body is straightforward:
struct set_fitness_body {
void operator() (const tbb::blocked_range <
vector_type::iterator >
&range ) const {
for ( vector_type::iterator i = range.begin();
i != range.end(); ++i) {
i->set_fitness();
}
}
};
Again, the natively threaded code is more difficult to write, uses a naive scheduling policy and is tied to the use of
num_threads. Of course it's possible to write a set of advanced support routines using native threads to do a better job of managing the concurrency—but if you did that, you'd be writing a concurrency platform instead of focusing on the application's features.