devxlogo

“Googlize” Your Java Apps to Search Billions of Web Pages

“Googlize” Your Java Apps to Search Billions of Web Pages

oogle has introduced a Web API service that enables developers to program search engine functionality into their applications. With the Google Web APIs service, a program can query more than 2 billion Web documents quickly and easily. Applications with this functionality allow users to schedule regular search requests that can help monitor the Web for new information on a subject or offer comparative analyses of the amount of information available on different subjects over time.

The Google Web APIs service provides a SOAP (Simple Object Access Protocol) interface to search Google’s index, accessing information and Web pages from its cache and checking the spelling of words against Google’s standard search syntax. With its use of the SOAP and WSDL standards, Google allows developers to program in three environments: Java, Perl, or Visual Studio .NET. In this article, I use a sample program I coded (GoogleSearchDemo.java) to demonstrate how to use the Google Web APIs service with Java code.

Get Started
First you need to download the Google kit from http://www.google.com/apis/download.html. The free downloadable kit contains:

  • A complete API reference describing the semantics of method calls and fields
  • Sample SOAP request and response messages
  • A Google Web API WSDL file
  • A Java library, example program, and Javadoc documentation
  • A sample .NET program

Create a Google Web APIs service account. Use your account username and password to log in and get an account key. Note that Google limits each developer who registers for the Web APIs service to 1,000 queries per day.

Now you’re ready to dive into the code. The following classes are included in the googleapi.jar file:

  • import com.google.soap.search.GoogleSearch; ? The GoogleSearch class provides access to the Google Web APIs, as well as Google search functions and cached pages via SOAP.
  • import com.google.soap.search.GoogleSearchResult; ? GoogleSearchResult encapsulates presents the complete results from each Google Web APIs search call. You should call the get methods only on this object; the fields are filled in when a search result is returned.
  • import com.google.soap.search.GoogleSearchResultElement; ? GoogleSearchResultElement contains an individual search result component of a GoogleSearchResult.
  • import com.google.soap.search.GoogleSearchFault; ? GoogleSearchFault is an exception that encapsulates various errors that can result from a Google API call.

Download my sample Google Web API program, GoogleSearchDemo.java. Create an instance of the GoogleSearch class and set the key that Google has provided. Keep in mind that Google won’t let you use its search functionality until you set the key:

GoogleSearch search = new GoogleSearch();search.setKey("yourkey");

After setting the key, set the query string for search:

search.setQueryString("cross language barriers for SOAP");

Now you need to invoke the Google search and store the return results:

GoogleSearchResult result = search.doSearch();

Next, iterate through the results:

GoogleSearchResultElement[] re = result.getResultElements();for ( int i1 = 0; i1 < re.length; i1++ ) {	System.out.println("" + re[i1].getTitle() + "
");}

Before compiling the code, you need to put the googleapi.jar file into your classpath.

What If I’m Behind a Firewall?
If you are running behind a firewall, Google search will return the following SOAP exception when you try to execute it:

com.google.soap.search.GoogleSearchFault: 
[SOAPException: faultCode=SOAP-ENV:Client;
msg=Error opening socket: api.google.com;
targetException=java.lang.IllegalArgumentException:
Error opening socket: api.google.com]

To get your code to work behind a firewall proxy, you’ll need to modify the GoogleSearch class and implement the following four methods of the org.apache.soap.transport.http.SOAPHTTPConnection class:

  • public void setProxyHost(String s){}
  • public void setProxyPort(int i){}
  • public void setProxyUserName(String s){}
  • public void setProxyPassword(String s){}

If you don’t want to modify your existing class, download patgoogle.jar, which is a patch for firewall proxies. It contains the GoogleSearch.class with this added modification. Be sure to place patgoogle.jar before googleapi.jar in the classpath, since a modified GoogleSearch class exists in patgoogle.jar. Hopefully, Google will include these changes in future releases of its kit so developers don’t have to add any patches. (In the sample program, I incorporated calls for firewall proxies too. If you are not running behind a firewall just comment those calls out.)

As Easy As It Seems
Using the Google Web APIs service as I’ve demonstrated, your application can search billions of Web page?and you don’t need to use any complicated code.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist