devxlogo

Statistics Made Easier with STL

Statistics Made Easier with STL

rogrammers developing financial, scientific, and numerical analysis applications often need to reinvent the wheel, implementing statistical functions for calculate mean, median, percentiles, and similar statistical data. This solution will show you how to implement some of these operations with a few useful STL algorithms.


How can you implement statistical functions for calculating mean, median, and similar operations?


Use the algorithms defined in the and libraries.

Mean and Lean
The operations required for calculating the average of a range of elements consist of summing up all the values within that range and dividing the result by the number of elements. This task can become unduly complex when you have to deal with various types of ranges and looping through containers in order to accumulate their sum. However, using the right STL algorithms, it’s a cinch.

The first step consists of summing up all the values in a range. For this purpose, use the accumulate() algorithm defined in . This algorithm accumulates all elements within a range into a single value. accumulate() has three overloaded versions, but for the sake of brevity, the first version is used here, with the following prototype:

template T accumulate (InputIterator first,              InputIterator last,              T init);

The first two parameters mark the boundaries of the range. The third argument is an initial value that is added to the result. Usually, it’s 0 but under certain conditions, you may need to provide a different initial value.

Author’s Note: To avoid truncation and rounding problems, use the floating point datatype with the highest precision supported by your compiler?double or long double.

Suppose you have a container that stores students’ grades:

vector  grades;grades.push_back(89);grades.push_back(74);grades.push_back(89);grades.push_back(63);grades.push_back(100);

First, accumulate all the grades:

double res=accumulate(grades.begin(), grades.end(),0);

Next, calculate the average:

res=res/grades.size();

You can accomplish these two operations in one shot:

double res= accumulate(grades.begin(),grades.end(),0)/double(grades.size());cout

The grades needn't be stored in a container object; you can apply accumulate() (as well as every other algorithm) and use a built-in array:

int grades[]={89, 74, 89, 63, 100};size_t range_size=sizeof(grades)/sizeof(grades[0]); double res= accumulate(grades, grades+range_size, 0)/double(range_size); 

Median
A median is the value that splits a range in two halves: half of the values are lower than or equal to the median value, and another half of the values is higher than the median. For example, in the range {60, 70, 89, 95, 100} the median is 89. It's easier to calculate the median when the range is sorted. If you're using a self-sorting container such as priority_queue or the associative containers map, multimap etc., you don't need to worry about sorting. If however the results are stored in a vector, simply call the sort() algorithm first:

sort(grades.begin(), grades.end());

Next, calculate the median like this:

cout

If, for some reason, you prefer not to sort the container (for example, if you modify the container frequently), you can use the nth_element() algorithm instead. nth_element() ensures that the nth element in the container contains the value that would be stored in that position if the container were sorted. In addition, this algorithm ensures that all elements prior to the nth position would also precede that position in an ordered collection, and that all elements following the nth position would also follow that position in an ordered collection. However, nth_element() doesn't sort the container:

nth_element(grades.begin(),            grades.begin()+grades.size()/2,             grades.end());median=*(grades.begin()+grades.size()/2);

Median is a specific case of the 50th percentile. To find the element that is at a different percentile, say the 25th percentile, use the following nth_element() call. For the range 60, 70, 89, 95, 100, the result should be 70 because 25 percent of the elements in the given range are below this value:

nth_element(grades.begin(),            grades.begin()+int((grades.size()*0.25),             grades.end());int p_25=*(grades.begin()+int(grades.size()*.25)); cout

Note that some compilers can't interpret the second argument of nth_element() without the explicit conversion to int.

Partitions
Sometimes you need to divide a range into two parts: all elements that satisfy a certain criterion, followed by all elements that don't. For example, to find out how many grades below 60 the range {20,30, 60,100} contains, use the partition() algorithm. partition() takes two iterators indicating the range's boundaries and a predicate. In this example, the predicate object is called smaller_than_sixty. partition() returns an iterator that is one past the end of the group of elements that satisfy this predicate:

for (vector::iterator it=grades.begin();     it 

Without Deviating from the Standard
Although C++ doesn't have a statistics package, the and libraries contain many useful algorithms that significantly simplify the implementation of such a home-made library, as shown in this solution.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist