tatistics and reports analyze the change over time of any kind of phenomena. For example, you could evaluate an employer's performance by analyzing progress curves provided by reports; managers can make business decisions based on statistical sales data; meteorologists can predict natural disasters based on statistical weather pattern data—and the list goes on. For the software industry, statistics and reports provide both an ongoing challenge and an ongoing market. At present, programming languages such as PHP and Java come with built-in packages for developing applications around statistical problems.
This article explores PHP's support for the statistical domain. You will see how to generate reports and statistics for simple text phrases, XML documents, and complex databases.
The Text_Statistics PEAR Package
The Text_Statistics PEAR package makes it easy to calculate some basic readability metrics on blocks of text. These metrics include such things as the number of words, the number of unique words, the number of sentences, and the number of total syllables. You can use these statistics to calculate the Flesch score for a sentence, which is a number between 0 and 100 that represents readability. Figure 1 shows the formula for the Flesch Reading Ease Score (FRES) test.
|Figure 1. Flesch Reading Ease Score Formula: This formula calculates the relative readability of a text; the higher the score, the easier the text is to read.
The higher the score, the more readable the text (high scores have larger potential audiences). For example, a Flesch score between 90 and 100 equates to a fifth-grade reading level; a Flesch score between 0 and 30 means the text may be readable only by college graduates. This tutorial provides more details about the Flesch readability formula and the Flesch readability tests (Flesch—Kincaid Grade Level).
You install the PEAR package like this (version 1.0 is the stable version):
pear install Text_Statistics
As you will see in the next two examples, the Text_Statistics PEAR is very easy to use, because the applications are straightforward and the code is very intuitive. For example, to retrieve the number of syllables in a word you can use the Text_Word class like this:
//the tested word
$word = 'paragraphs';
////create an instance of the Text_Word class
$stats = new Text_Word($word);
//Print the syllables of the $word variable
print_r("The word '".$word."'
has ".$stats->numSyllables()." syllables.");
The output of this example is:
The word 'paragraphs' has 3 syllables.
Here's a more complete example based on the Text_Statistics class that analyzes a complete (but still short) sentence. In this case the output contains more information, including the number of syllables, the number of unique words, the Flesch score, the abbreviations, and more:
//the tested text
$text = "This is an example.";
//create an instance of the Text_Statistics class
$stats = new Text_Statistics($text);
// Print the number of syllables, number of unique words,
// the Flesch number, the abbreviations, and more
print_r("<b>The entire array:</b><br />");
print_r("<br /><br />");
print_r("<b>Text:</b> ".$stats->text."<br />");
print_r("<b>Syllables:</b> ".$stats->numSyllables."<br />");
print_r("<b>Words number:</b> ".$stats->numWords."<br />");
print_r("<b>Unique words:</b> ".$stats->uniqWords."<br />");
print_r("<b>Sentences number:</b> ".$stats->numSentences."<br />");
print_r("<b>Flesch:</b> ".$stats->flesch."<br />");
The output of this example shows both a raw array of data that the Text_Statistics package returns and a list of specific values extracted from that array and formatted more readably:
The entire array:
Text_Statistics Object ( [text] => This is an example.
[numSyllables] => 5
[numWords] => 4
[uniqWords] => 4
[numSentences] => 1
[flesch] => 97.025
[_abbreviations] => Array ( [/Mr\./] => Misterr
[/Mrs\./i] => Misses [/etc\./i] => etcetera
[/Dr\./i] => Doctor )
[_uniques] => Array (
[this] => 1
[is] => 1
[an] => 1
[example] => 1 ) )
Text: This is an example.
Words number: 4
Unique words: 4
Sentences number: 1
You aren't limited to analyzing text files, though; another PEAR package makes it easy to analyze data stored in XML.