Browse DevX
Sign up for e-mail newsletters from DevX


Plan for the Future: Express Parallelism, Don't Manage It  : Page 3

When adding parallelism, the key design choice is to express concurrency in an application without explicitly managing the scheduling of that concurrency onto the hardware.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Parallelizing assign_fitness

After creating each generation, the sample application must assign a fitness value to each individual in that new generation. The serial implementation of assign_fitness in serial_ga.cpp uses std::for_each to iterate through the new children in the my_individuals vector to assign fitness values to them:

inline void population::assign_fitness() { std::for_each( my_individuals.begin() + population_size, my_individuals.end(), set_fitness() ); }

The process to implement a natively threaded version of assign_fitness is similar to that you saw in the preceding section to implement the threaded version of generate_children, so you'll see a condensed version here. First, add code to create and join a set of native threads:

inline void population::assign_fittness() { for ( int t = 0; t < num_threads; ++t ) { handles[t] = _beginthread( &start_set_fitness, 0, (void *)t ); } WaitForMultipleObjects( num_threads, (HANDLE *)handles, true, INFINITE ); }

Next, package the loop so you can pass it through the Windows threading API:

inline void start_set_fitness( void *x ) { population_helper::set_fitness( int(x) ); }

Finally, modify the loop code to use the same adjust_begin_end scheduling routine as generate_children:

static void population_helper::set_fitness( const int thread_id ) { size_t begin = population_size; size_t end = my_individuals->size(); adjust_begin_end( thread_id, begin, end ); for ( size_t i = begin; i < end; ++i ) { (*my_individuals)[i].set_fitness(); } }

The TBB implementation simply replaces std::for_each with tbb::parallel_for by using a blocked_range of iterators:

inline void assign_fitness() { tbb::parallel_for( tbb::blocked_range<vector_type::iterator>( my_individuals.begin() + population_size, my_individuals.end() ), set_fitness_body(), tbb::auto_partitioner() ); }

Implementing set_fitness_body is straightforward:

struct set_fitness_body { void operator() (const tbb::blocked_range < vector_type::iterator > &range ) const { for ( vector_type::iterator i = range.begin(); i != range.end(); ++i) { i->set_fitness(); } } };

Again, the natively threaded code is more difficult to write, uses a naive scheduling policy and is tied to the use of num_threads. Of course it's possible to write a set of advanced support routines using native threads to do a better job of managing the concurrency—but if you did that, you'd be writing a concurrency platform instead of focusing on the application's features.

Thanks for your registration, follow us on our social networks to keep up-to-date