devxlogo

Get Started with Google OneBox for Enterprise

Get Started with Google OneBox for Enterprise

mployees and clients can make better decisions, increase productivity, and realize other benefits when they can access company information (statistics, presentations, reports, etc.) accurately and in a timely fashion. Because such information evolves constantly, distilling it accurately as the evolution occurs can turn information chaos into valuable capital assets.

Accurately distilling enterprise information is a complex task that requires extracting the information from a myriad of repositories in multiple different formats, then exposing the formatted data using standard retrieval technologies. Enterprise search products such as Autonomy IDOL, FAST Search, Google Search Appliance, Microsoft Duet, and Yahoo! Search Subscriptions seek to prosper from this opportunity.

Using the Google Search Appliance suite, a company can expose its essential information using the same search technologies that Google uses to process global information on the web. The Google Search Appliance suite is a hardware/software encapsulation that gathers content and creates indexes to prepare data for retrieval using Google’s search technologies.

Google OneBox for Enterprise is a REST-based XML framework and application programming interface (API) that complements Google Search Appliance by facilitating access to real-time information in enterprise content repositories using a single search field or box, thus the name “OneBox.”

This article discusses OneBox for Enterprise and how you can exploit it using Java and Java EE technologies.

 
Figure 1. OneBox Processing Flow: The diagram shows how requests flow from a search client through the Google Search Appliance (or simulator) to defined OneBox modules and data stores, then back to the client as transformed, formatted results.

Introducing Google OneBox
Google OneBox for Enterprise is driven by a simple keyword-based and/or expression-based search interface which then creates queries suitable for the various content providers. The search engine returns query results to a Google Search Appliance, which aggregates and delivers the formatted results to search clients. OneBox formats its own results so that they appear above other search results in the hit list.

Here’s how processing flows through Google OneBox for Enterprise:

  1. A search begins when a search client enters a search query containing keywords or a search expression. That query gets transmitted to the Google Search Appliance.
  2. The Google Search Appliance tests each deployed OneBox module to determine whether the search expression matches the trigger for that module.
  3. The Google Search Appliance invokes the provider for each triggered OneBox module, passing the search expression to each provider
  4. The provider processes the search expression, formats the results according to the schema defined in a file named oneboxresults.xsd, and passes the results back to the appliance as XML
  5. The XML is transformed by the appliance using the XSL template, if a template is provided in the OneBox module. The transformed results are then passed to the search client

The diagram in Figure 1 illustrates the OneBox processing flow.

OneBox Modules
A conceptual abstraction of searchable information in Google OneBox for Enterprise is called a “module.” A module is defined by the following primary components:

  • Module Name, Type, and Description?Reference information for the module. The module type must be defined as either “internal” or “external.” Internal modules gather information directly from the Google Search Appliance. External modules gather information from external sources, specified by a provider URL.
  • Trigger?Keywords or search expression that will invoke data collecting by the module. For example, the following code defines a trigger instigated by the keywords: directory, dir, contact, and/or phone:
  •          directory|dir|contact|phone   
  • Provider URL?URL of entity responsible for resolving a data query. For example, the following defines a provider named SampleNoAuthOneBoxProvider, available at localhost, port 8080:
  •          http://localhost:8080/onebox/SampleNoAuthOneBoxProvider   
  • Security?An optional specification of authentication parameters and rules. For example, the following defines basic user-name and password authentication rules:
  •          jdoe      foobar   
  • Results Template?An optional XSLT template definition to be used to transform query results

Google and its partners offer many pre-built OneBox modules that offer easy access to many common enterprise data sources, but Google provides an API that lets you build your own OneBox modules. For example, the following code defines a simple external module definition named my_onebox_example that’s accessible at http://localhost/onebox/foobarsearch, requires no authentication, is triggered by the keyword foobar, and provides no XSLT results template:

                  my_onebox_example                     This is a simple external OneBox module                             foobar                     http://localhost/onebox/foobarsearch               

The schema document onebox.xsd defines complete details for module definitions.

The Google OneBox for Enterprise SDK contains sample code, documentation, and the libraries you need to build OneBox modules for the Google Search Appliance. The SDK also contains a Python-based Google Search Appliance simulator for Enterprise OneBox.

For Java developers, Google offers the OneBox Servlet Starter Kit to enable integration with Google OneBox for Enterprise and Google Search Appliance using standard Java and Java EE technologies.

Installing Google OneBox for Enterprise
The OneBox Servlet Starter Kit provides components and libraries that enable integration with Google OneBox for Enterprise using a Java servlet-based API. The kit offers a Web application archive (WAR) that can be deployed immediately to any Java EE application server and serve sample OneBox-compliant data to Google Search Appliance or appliance simulator. The kit includes source code, Javadoc, and scripts to simplify the learning curve for developing custom OneBox modules and information providers.

After downloading and installing the OneBox Servlet Starter Kit, you can follow these five steps to see results immediately in Google Search Appliance or in the appliance simulator:

  1. Deploy onebox.war to your Java EE application server
  2. Modify the element of the onebox.xml module definition file to point to the onebox.war context within your application server
  3. Install the onebox.xml module definition file into Google Search Appliance or appliance simulator
  4. Start your Java EE application server
  5. Run some sample queries that will instigate triggers defined in the OneBox module definition to see sample results

I’ll discuss these steps discussed in detail in the following sections. First, however, the Google Search Appliance simulator supplied as part of the Google OneBox for Enterprise SDK warrants a discussion.

Using the Google Search Appliance Simulator
The Google OneBox for Enterprise SDK includes a Python-based simulator that facilitates enterprise search simulations. The Python simulator acts in proxy for Google Search Appliance and the OneBox facilities.

After downloading the Google OneBox for Enterprise SDK, download the distribution of Python appropriate for your platform.

The Python simulator is OneBox-enabled with an XML file compliant with the schema found in onebox.xsd of the OneBox for Enterprise SDK. The simulator accepts queries from a Python command line, which it passes to each OneBox module configured in the onebox.xsd-compliant XML file.

Because the onebox.xml file complies with the onebox.xsd schema, you can use it to configure the simulator. This file, defined roughly as follows, contains a single OneBox module definition named “directory_onebox.”

                        directory_onebox                          This is a sample OneBox module that queries          for directory information.                                                http://localhost:8080/onebox/SampleNoAuthOneBoxProvider                                         

After installing the Google OneBox for Enterprise SDK, you can run the Python simulator to process queries as a proxy for an actual Google Search Appliance. You start the Google Search Appliance with this command line:

   python onebox_simulator.py       onebox.xml --dumpOutput=1 --debug=1

In the preceding command, remember to replace with the full directory name of your OneBox SDK installation. Running the simulator with the dumpOutput=1 option combines the OneBox results with simulator’s search results. The simulator generates search results from the search.xml file, also distributed with the SDK.

To test it, enter this sample query from the Python command line:

   query: Brown

After you enter the query, the simulator transmits a request to the URL specified in the providerURL element of each module definition from the onebox.xml file.

The preceding query yields results similar to Listing 1.

In Listing 1, the results returned from the simulator query contain an XML document immediately following the line reading “printed it as.” Copy this XML document and apply the gsa_default_stylesheet.en.xsl stylesheet, found in the Google OneBox for Enterprise SDK, to the document using any XML development tool.

Now, here’s a sample query from a web browser using:

   http://localhost:8080/onebox/SampleNoAuthOneBoxProvider?authType=      none&apiMaj=1&apiMin=2&lang=en&query=Brown

The preceding Web browser query yields the results shown in Listing 2.

 
Figure 2. Query Results in HTML: After applying the default stylesheet to the XML results returned from the simulator using the “Brown” query, here’s the resulting HTML.

When you apply the default stylesheet (gsa_default_stylesheet.en.xsl) against the preceding results document and save the result as HTML, the resulting page looks like Figure 2.

You can customize the display by editing the onebox-default.xsl (which is ultimately called from gsa_default_stylesheet.en.xsl) and re-apply the stylesheet, gsa_default_stylesheet.en.xsl, against the results in your XML development tool to see the changes. When you are satisfied with the transformed results, paste the XSL file contents, after the element into your OneBox module definition XML file as the body of the element.

Defining Custom OneBox Modules
Creating a OneBox module is a three-step process: creating a trigger, selecting a provider, and formatting the results for output.

Creating a Trigger
A OneBox module trigger determines when the OneBox provider will be invoked. When a query is executed matching the rules specified by the trigger, the trigger is invoked. A trigger can be configured as one of the following:

  • Always On?The module will be invoked by every query
  • Keyword(s)?The module will be invoked by the keywords specified in the trigger definition
  • Regular Expression?The module will be invoked when the query matches a regular expression defined by the trigger

Selecting a Provider
The OneBox module provider is the entity that handles requests transmitted by the Google Search Appliance. The provider builds results based on the query and the query parameters.

There are two types of providers:

  • Internal?Internal providers gather information, referred to as “collections,” directly from the Google Search Appliance
  • External?External providers gather information from external sources, specified by a provider URL

Formatting the Results
Each provider returns results as XML. The search appliance uses XSL templates embedded in the module definition file to transform the returned XML into the final output display format.

Creating a OneBox Provider
You can implement OneBox providers using any technology that can handle HTTP Get requests and return XML results. In Java, servlets fit these requirements. For example, the servlet shown in Listing 3 handles requests from a search appliance and returns OneBox results.

You can download the sample code for this article and test it yourself. You deploy the servlet in Listing 3 to your application server in the same manner as any standard Java servlet. Note that you should replace the body of the findModuleResults method to reflect your actual search results. The servlet adds each match to the array of ModuleResult objects returned from findModuleResults.

To deploy the servlet to your Google Search Appliance or appliance simulator, you must provide to the appliance or simulator a OneBox module configuration file that defines the necessary information required to find the servlet. In this case, assuming that the servlet is deployed to the URL http://localhost:8080/onebox/SampleDirectory, the OneBox module configuration file could be as simple as the following:

                           directory_onebox                         This OneBox module queries for sample directory information.                                   directory|dir|phone                         http://localhost:8080/onebox/SampleDirectory                                  ...               

After deploying the preceding OneBox module configuration file to your Google Search Appliance or appliance simulator, the appliance or simulator can redirect requests matching the trigger supplied in the configuration file to your OneBox provider servlet. In this case, because the trigger keywords are defined as directory|dir|phone, any search containing a combination of these keywords would trigger a call to the OneBox provider deployed to the URL http://localhost:8080/onebox/SampleDirectory.

 
Figure 3. Sample Search Results: Using a simple XSL result template, here’s how the results of a search from a Web browser processed by the sample OneBox module servlet might look.

For example, the following query would trigger this provider servlet:

   directory Doe

HTML results for the query depend on the resultTemplate element contents defined in the OneBox module configuration file. A simple example in a Web browser might look like Figure 3.

Using Google Search Appliance, a company can expose vital information using the same search technologies that Google uses to process global information on the web.

Google OneBox for Enterprise is an application programming interface (API) and framework that complements Google Search Appliance by facilitating access to real-time information in enterprise content repositories using a single search box that drives queries to provider modules.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist