Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Create a LAMP Search Engine Using Multithreaded Perl : Page 2

Explore the multithreaded capabilities of Perl while building a LAMP Web crawler with all the necessary components of a basic search engine.


advertisement
Step 1: Preliminary Setup
To begin, you must set up your Linux system with the proper software:
  1. Apache
  2. MySQL
  3. PHP
  4. Perl (with threading support)

Apache
Although you may use other versions of Apache, the PHP application is developed using version 2.0.47.

MySQL
If you do not already have MySQL installed on your Linux system, you can obtain it directly from MySQL.com. If you run the crawler on the same box as your database, install the MySQL server, client, and development libraries, for example:

  • rpm -ivh MySQL-server-4.0.18-0.i386.rpm
  • rpm -ivh MySQL-client-4.0.18-0.i386.rpm
  • rpm -ivh MySQL-devel-4.0.18-0.i386.rpm



Once you've installed the MySQL server, you must create the database used in this example and apply some additional security. At the command prompt, type the following commands:

  1. mysql -u root
  2. create database search;
  3. GRANT ALL PRIVILEGES ON *.* TO search@localhost IDENTIFIED BY 'clamchowder' WITH GRANT OPTION;
  4. GRANT USAGE ON *.* TO search@localhost;

Tweak the above privileges to meet your specific needs and be sure to choose a more secure password.

Run the script (included in the code download) at the command prompt to create the physical database structure and load some initial words into your dictionary:

  • mysql -u root -p search < /search.sql

Once completed, you can issue the show tables command from within MySQL:

mysql> show tables; +-----------------------+ | Tables_in_search | +-----------------------+ | assoc_url_dictionary | | dictionary | | thread_instruction | | url | +-----------------------+ 4 rows in set (0.01 sec)

PHP

  1. Obtain PHP from PHP.net.
  2. Once downloaded, extract the files by typing: bzip2 -dc php-4.3.6.tar.bz2 | tar -xvf-
  3. After you've extracted the files, configure PHP with Apache 2 and MySQL support. Change the path in the 'apxs2' parameter if your Apache bin directory is different: ./configure --with-mysql --with-apxs2=/usr/local/apache2/bin/apxs
  4. Add "AddType application/x-httpd-php .php" to your Apache httpd.conf file. This lets Apache know what to do with PHP files.
  5. To actually compile the PHP source, type: make
  6. To install PHP: make install
  7. Now set up your php.ini. You may edit your .ini file to set different PHP options: cp php.ini-dist /usr/local/lib/php.ini
  8. Modify the "register_globals" parameter to "On" in /usr/local/lib/php.ini
Perl (with Thread Support)
  1. Download Perl from Perl.org.
  2. Unpack it.
Pay attention to the following items when you configure Perl on your system:
  1. Be sure to use the "-Dusethreads" option. This is necessary for multithreading in Perl. To configure Perl with threading support on your system, type: sh Configure –Dusethreads
  2. Choose to install libperl.so when prompted. This is a Perl interpreter for the Apache Web server.
After the install of Perl is complete, you can install other needed components by typing:
  • perl -MCPAN -e 'install HTML::Tagset'
  • perl -MCPAN -e 'install HTML::LinkExtor'
  • perl -MCPAN -e 'install HTML::Parser'
  • perl -MCPAN -e 'install Bundle::LWP'
  • perl -MCPAN -e 'install LWP::Parallel::UserAgent'
  • perl -MCPAN -e 'install Net::MySQL'
  • perl -MCPAN -e 'install DBI'
  • perl -MCPAN -e 'install DBD::mysql'

To clear up some possible confusion, the difference between the DBD- MySQL and DBI modules is as follows:

  • The DBI (Database Driver) is a module that enables Perl programs to attach to databases.
  • The DBD-mysql is a database driver for MySQL.
DBI uses the DBD as a translator to talk to MySQL.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap