Finding the Correct Birth Year
When the program finishes scanning the URLs identified with a famous person, you are left with a list of potential birth years. The program also tracks how many times each potential birth year occurs. It calls the getResult
function to determine which year had the largest number of "votes."
The function begins by creating two variables. The result
variable holds the year with the largest count. The second variable, named maxCount
, holds the number of votes held by the current value of the result variable.
int result = -1;
int maxCount = 0;
|Bots are usually designed to access specific data. If you need to obtain data, and that data is available on the Internet, you can probably construct a bot to obtain it.|
Next, it creates a Set that contains each birth year, and counts the occurrences of each. At the end, the result
variable will hold the birth year with the largest count:
Set<Integer> set = results.keySet();
for (int year : set)
int count = results.get(year);
if (count > maxCount)
result = year;
maxCount = count;
If no birth years were found, then the result
variable remains set to its initial value of -1
, which informs the calling method that no birth year was found.
Going on From Here
This article showed you how to create a bot that makes use of the Yahoo web services API. This bot uses the Yahoo API to find likely pages to visit. Subsequently, it uses regular Java HTTP programming to access and analyze the data contained in those pages.
Bots are usually designed to access specific data. If you need to obtain data, and that data is available on the Internet, you can probably construct a bot to obtain it. Using the Java HTTP functions a Java program can perform any task that a regular web user would. Creating the bot is simply a matter of reproducing the correct HTTP requests in your bot and writing the appropriate code for data recognition, extraction, and analysis.
Fortunately, as you have seen, much of the codethe initial search, URL gathering, HTML stripping, sentence collection, and word tokenizingis boilerplate; you'd write the same type of code to search for any type of data. That also means it's reusable. The only part that's not reusable is the code that identifies and analyzes the specific data you're looking for. By replacing that code with your own custom code, you have all the basic tools you need to construct your own bots to search and extract data from the web.