
ntil recently, people argued that Perl did not have stable multithreaded capabilities. This article presents a Perl application for which using multithreading capabilities makes sense: a Web crawler with all the necessary components of a basic search engine. The
downloadable code includes the MySQL database creation scripts, Perl code, and PHP interface files.
The application requirements for the example are:
- All open source
- Small footprint
- Ability to score content
- Multithreaded application
To exemplify the point of a small footprint, the crawler, search engine, and database run on a very old Pentium 166MHz with 32MB RAM. It's not fast by any means, but the amount of performance you can get running Linux on such old hardware is amazing.
Figure 1 depicts the architecture utilized in this example by showing the different components that make up the search engine (e.g., the dictionary hash, the multithreaded crawler, and the PHP front-end used to search the database).
 | |
| Figure 1: Architecture Utilized for Search Engine |
The article breaks out into the following steps:
- Preliminary Setup
- Code Snippets and Explanations
- Web Interface—Searching the content
The Code Snippets and Explanations section describes the components listed in Figure 1.