Browse DevX
Sign up for e-mail newsletters from DevX


Statistics Made Easier with STL-3 : Page 3




Building the Right Environment to Support AI, Machine Learning and Deep Learning

A median is the value that splits a range in two halves: half of the values are lower than or equal to the median value, and another half of the values is higher than the median. For example, in the range {60, 70, 89, 95, 100} the median is 89. It's easier to calculate the median when the range is sorted. If you're using a self-sorting container such as priority_queue or the associative containers map, multimap etc., you don't need to worry about sorting. If however the results are stored in a vector, simply call the sort() algorithm first:

sort(grades.begin(), grades.end());

Next, calculate the median like this:

cout<<*(grades.begin()+grades.size()/2); //89

If, for some reason, you prefer not to sort the container (for example, if you modify the container frequently), you can use the nth_element() algorithm instead. nth_element() ensures that the nth element in the container contains the value that would be stored in that position if the container were sorted. In addition, this algorithm ensures that all elements prior to the nth position would also precede that position in an ordered collection, and that all elements following the nth position would also follow that position in an ordered collection. However, nth_element() doesn't sort the container:

nth_element(grades.begin(), grades.begin()+grades.size()/2, grades.end()); median=*(grades.begin()+grades.size()/2);

Median is a specific case of the 50th percentile. To find the element that is at a different percentile, say the 25th percentile, use the following nth_element() call. For the range 60, 70, 89, 95, 100, the result should be 70 because 25 percent of the elements in the given range are below this value:

nth_element(grades.begin(), grades.begin()+int((grades.size()*0.25), grades.end()); int p_25= *(grades.begin()+int(grades.size()*.25)); cout<<p_25<< "is @ the 25th percentile" << endl;

Note that some compilers can't interpret the second argument of nth_element() without the explicit conversion to int.

Sometimes you need to divide a range into two parts: all elements that satisfy a certain criterion, followed by all elements that don't. For example, to find out how many grades below 60 the range {20,30, 60,100} contains, use the partition() algorithm. partition() takes two iterators indicating the range's boundaries and a predicate. In this example, the predicate object is called smaller_than_sixty. partition() returns an iterator that is one past the end of the group of elements that satisfy this predicate:

for (vector<int>::iterator it=grades.begin(); it < part; it++) { cout<<*it<<" is smaller than 60"<<endl; }

Without Deviating from the Standard
Although C++ doesn't have a statistics package, the <algorithm> and <numeric> libraries contain many useful algorithms that significantly simplify the implementation of such a home-made library, as shown in this solution.

Danny Kalev is a certified system analyst and software engineer specializing in C++. He was a member of the C++ standards committee between 1997 and 2000 and has since been involved informally in the C++0x standardization process. He is the author of "The ANSI/ISO Professional C++ Programmer's Handbook" and "The Informit C++ Reference Guide: Techniques, Insight, and Practical Advice on C++."
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date