Get Started with Google OneBox for Enterprise

Get Started with Google OneBox for Enterprise

mployees and clients can make better decisions, increase productivity, and realize other benefits when they can access company information (statistics, presentations, reports, etc.) accurately and in a timely fashion. Because such information evolves constantly, distilling it accurately as the evolution occurs can turn information chaos into valuable capital assets.

Accurately distilling enterprise information is a complex task that requires extracting the information from a myriad of repositories in multiple different formats, then exposing the formatted data using standard retrieval technologies. Enterprise search products such as Autonomy IDOL, FAST Search, Google Search Appliance, Microsoft Duet, and Yahoo! Search Subscriptions seek to prosper from this opportunity.

Using the Google Search Appliance suite, a company can expose its essential information using the same search technologies that Google uses to process global information on the web. The Google Search Appliance suite is a hardware/software encapsulation that gathers content and creates indexes to prepare data for retrieval using Google’s search technologies.

Google OneBox for Enterprise is a REST-based XML framework and application programming interface (API) that complements Google Search Appliance by facilitating access to real-time information in enterprise content repositories using a single search field or box, thus the name “OneBox.”

This article discusses OneBox for Enterprise and how you can exploit it using Java and Java EE technologies.

 
Figure 1. OneBox Processing Flow: The diagram shows how requests flow from a search client through the Google Search Appliance (or simulator) to defined OneBox modules and data stores, then back to the client as transformed, formatted results.

Introducing Google OneBox
Google OneBox for Enterprise is driven by a simple keyword-based and/or expression-based search interface which then creates queries suitable for the various content providers. The search engine returns query results to a Google Search Appliance, which aggregates and delivers the formatted results to search clients. OneBox formats its own results so that they appear above other search results in the hit list.

Here’s how processing flows through Google OneBox for Enterprise:

  1. A search begins when a search client enters a search query containing keywords or a search expression. That query gets transmitted to the Google Search Appliance.
  2. The Google Search Appliance tests each deployed OneBox module to determine whether the search expression matches the trigger for that module.
  3. The Google Search Appliance invokes the provider for each triggered OneBox module, passing the search expression to each provider
  4. The provider processes the search expression, formats the results according to the schema defined in a file named oneboxresults.xsd, and passes the results back to the appliance as XML
  5. The XML is transformed by the appliance using the XSL template, if a template is provided in the OneBox module. The transformed results are then passed to the search client

The diagram in Figure 1 illustrates the OneBox processing flow.

OneBox Modules
A conceptual abstraction of searchable information in Google OneBox for Enterprise is called a “module.” A module is defined by the following primary components:

  • Module Name, Type, and Description?Reference information for the module. The module type must be defined as either “internal” or “external.” Internal modules gather information directly from the Google Search Appliance. External modules gather information from external sources, specified by a provider URL.
  • Trigger?Keywords or search expression that will invoke data collecting by the module. For example, the following code defines a trigger instigated by the keywords: directory, dir, contact, and/or phone:
  •          directory|dir|contact|phone   
  • Provider URL?URL of entity responsible for resolving a data query. For example, the following defines a provider named SampleNoAuthOneBoxProvider, available at localhost, port 8080:
  •          http://localhost:8080/onebox/SampleNoAuthOneBoxProvider   
  • Security?An optional specification of authentication parameters and rules. For example, the following defines basic user-name and password authentication rules:
  •          jdoe      foobar   
  • Results Template?An optional XSLT template definition to be used to transform query results

Google and its partners offer many pre-built OneBox modules that offer easy access to many common enterprise data sources, but Google provides an API that lets you build your own OneBox modules. For example, the following code defines a simple external module definition named my_onebox_example that’s accessible at http://localhost/onebox/foobarsearch, requires no authentication, is triggered by the keyword foobar, and provides no XSLT results template:

                  my_onebox_example                     This is a simple external OneBox module                             foobar                     http://localhost/onebox/foobarsearch               

The schema document onebox.xsd defines complete details for module definitions.

The Google OneBox for Enterprise SDK contains sample code, documentation, and the libraries you need to build OneBox modules for the Google Search Appliance. The SDK also contains a Python-based Google Search Appliance simulator for Enterprise OneBox.

For Java developers, Google offers the OneBox Servlet Starter Kit to enable integration with Google OneBox for Enterprise and Google Search Appliance using standard Java and Java EE technologies.

Installing Google OneBox for Enterprise
The OneBox Servlet Starter Kit provides components and libraries that enable integration with Google OneBox for Enterprise using a Java servlet-based API. The kit offers a Web application archive (WAR) that can be deployed immediately to any Java EE application server and serve sample OneBox-compliant data to Google Search Appliance or appliance simulator. The kit includes source code, Javadoc, and scripts to simplify the learning curve for developing custom OneBox modules and information providers.

After downloading and installing the OneBox Servlet Starter Kit, you can follow these five steps to see results immediately in Google Search Appliance or in the appliance simulator:

  1. Deploy onebox.war to your Java EE application server
  2. Modify the element of the onebox.xml module definition file to point to the onebox.war context within your application server
  3. Install the onebox.xml module definition file into Google Search Appliance or appliance simulator
  4. Start your Java EE application server
  5. Run some sample queries that will instigate triggers defined in the OneBox module definition to see sample results

I’ll discuss these steps discussed in detail in the following sections. First, however, the Google Search Appliance simulator supplied as part of the Google OneBox for Enterprise SDK warrants a discussion.

Using the Google Search Appliance Simulator
The Google OneBox for Enterprise SDK includes a Python-based simulator that facilitates enterprise search simulations. The Python simulator acts in proxy for Google Search Appliance and the OneBox facilities.

After downloading the Google OneBox for Enterprise SDK, download the distribution of Python appropriate for your platform.

The Python simulator is OneBox-enabled with an XML file compliant with the schema found in onebox.xsd of the OneBox for Enterprise SDK. The simulator accepts queries from a Python command line, which it passes to each OneBox module configured in the onebox.xsd-compliant XML file.

Because the onebox.xml file complies with the onebox.xsd schema, you can use it to configure the simulator. This file, defined roughly as follows, contains a single OneBox module definition named “directory_onebox.”

                        directory_onebox                          This is a sample OneBox module that queries          for directory information.                                                http://localhost:8080/onebox/SampleNoAuthOneBoxProvider                                         

After installing the Google OneBox for Enterprise SDK, you can run the Python simulator to process queries as a proxy for an actual Google Search Appliance. You start the Google Search Appliance with this command line:

   python onebox_simulator.py       onebox.xml --dumpOutput=1 --debug=1

In the preceding command, remember to replace with the full directory name of your OneBox SDK installation. Running the simulator with the dumpOutput=1 option combines the OneBox results with simulator’s search results. The simulator generates search results from the search.xml file, also distributed with the SDK.

To test it, enter this sample query from the Python command line:

   query: Brown

After you enter the query, the simulator transmits a request to the URL specified in the providerURL element of each module definition from the onebox.xml file.

The preceding query yields results similar to Listing 1.

In Listing 1, the results returned from the simulator query contain an XML document immediately following the line reading “printed it as.” Copy this XML document and apply the gsa_default_stylesheet.en.xsl stylesheet, found in the Google OneBox for Enterprise SDK, to the document using any XML development tool.

Now, here’s a sample query from a web browser using:

   http://localhost:8080/onebox/SampleNoAuthOneBoxProvider?authType=      none&apiMaj=1&apiMin=2&lang=en&query=Brown

The preceding Web browser query yields the results shown in Listing 2.

 
Figure 2. Query Results in HTML: After applying the default stylesheet to the XML results returned from the simulator using the “Brown” query, here’s the resulting HTML.

When you apply the default stylesheet (gsa_default_stylesheet.en.xsl) against the preceding results document and save the result as HTML, the resulting page looks like Figure 2.

You can customize the display by editing the onebox-default.xsl (which is ultimately called from gsa_default_stylesheet.en.xsl) and re-apply the stylesheet, gsa_default_stylesheet.en.xsl, against the results in your XML development tool to see the changes. When you are satisfied with the transformed results, paste the XSL file contents, after the element into your OneBox module definition XML file as the body of the element.

Defining Custom OneBox Modules
Creating a OneBox module is a three-step process: creating a trigger, selecting a provider, and formatting the results for output.

Creating a Trigger
A OneBox module trigger determines when the OneBox provider will be invoked. When a query is executed matching the rules specified by the trigger, the trigger is invoked. A trigger can be configured as one of the following:

  • Always On?The module will be invoked by every query
  • Keyword(s)?The module will be invoked by the keywords specified in the trigger definition
  • Regular Expression?The module will be invoked when the query matches a regular expression defined by the trigger

Selecting a Provider
The OneBox module provider is the entity that handles requests transmitted by the Google Search Appliance. The provider builds results based on the query and the query parameters.

There are two types of providers:

  • Internal?Internal providers gather information, referred to as “collections,” directly from the Google Search Appliance
  • External?External providers gather information from external sources, specified by a provider URL

Formatting the Results
Each provider returns results as XML. The search appliance uses XSL templates embedded in the module definition file to transform the returned XML into the final output display format.

Creating a OneBox Provider
You can implement OneBox providers using any technology that can handle HTTP Get requests and return XML results. In Java, servlets fit these requirements. For example, the servlet shown in Listing 3 handles requests from a search appliance and returns OneBox results.

You can download the sample code for this article and test it yourself. You deploy the servlet in Listing 3 to your application server in the same manner as any standard Java servlet. Note that you should replace the body of the findModuleResults method to reflect your actual search results. The servlet adds each match to the array of ModuleResult objects returned from findModuleResults.

To deploy the servlet to your Google Search Appliance or appliance simulator, you must provide to the appliance or simulator a OneBox module configuration file that defines the necessary information required to find the servlet. In this case, assuming that the servlet is deployed to the URL http://localhost:8080/onebox/SampleDirectory, the OneBox module configuration file could be as simple as the following:

                           directory_onebox                         This OneBox module queries for sample directory information.                                   directory|dir|phone                         http://localhost:8080/onebox/SampleDirectory                                  ...               

After deploying the preceding OneBox module configuration file to your Google Search Appliance or appliance simulator, the appliance or simulator can redirect requests matching the trigger supplied in the configuration file to your OneBox provider servlet. In this case, because the trigger keywords are defined as directory|dir|phone, any search containing a combination of these keywords would trigger a call to the OneBox provider deployed to the URL http://localhost:8080/onebox/SampleDirectory.

 
Figure 3. Sample Search Results: Using a simple XSL result template, here’s how the results of a search from a Web browser processed by the sample OneBox module servlet might look.

For example, the following query would trigger this provider servlet:

   directory Doe

HTML results for the query depend on the resultTemplate element contents defined in the OneBox module configuration file. A simple example in a Web browser might look like Figure 3.

Using Google Search Appliance, a company can expose vital information using the same search technologies that Google uses to process global information on the web.

Google OneBox for Enterprise is an application programming interface (API) and framework that complements Google Search Appliance by facilitating access to real-time information in enterprise content repositories using a single search box that drives queries to provider modules.

devx-admin

devx-admin

Share the Post:
Bold Evolution

Intel’s Bold Comeback

Intel, a leading figure in the semiconductor industry, has underperformed in the stock market over the past five years, with shares dropping by 4% as

Semiconductor market

Semiconductor Slump: Rebound on the Horizon

In recent years, the semiconductor sector has faced a slump due to decreasing PC and smartphone sales, especially in 2022 and 2023. Nonetheless, as 2024

Learn Web Security

An Easy Way to Learn Web Security

The Web Security Academy has recently introduced new educational courses designed to offer a comprehensible and straightforward journey through the intricate realm of web security.

Military Drones Revolution

Military Drones: New Mobile Command Centers

The Air Force Special Operations Command (AFSOC) is currently working on a pioneering project that aims to transform MQ-9 Reaper drones into mobile command centers

Tech Partnership

US and Vietnam: The Next Tech Leaders?

The US and Vietnam have entered into a series of multi-billion-dollar business deals, marking a significant leap forward in their cooperation in vital sectors like

Bold Evolution

Intel’s Bold Comeback

Intel, a leading figure in the semiconductor industry, has underperformed in the stock market over the past five years, with shares dropping by 4% as opposed to the 176% return

Semiconductor market

Semiconductor Slump: Rebound on the Horizon

In recent years, the semiconductor sector has faced a slump due to decreasing PC and smartphone sales, especially in 2022 and 2023. Nonetheless, as 2024 approaches, the industry seems to

Elevated Content Deals

Elevate Your Content Creation with Amazing Deals

The latest Tech Deals cater to creators of different levels and budgets, featuring a variety of computer accessories and tools designed specifically for content creation. Enhance your technological setup with

Learn Web Security

An Easy Way to Learn Web Security

The Web Security Academy has recently introduced new educational courses designed to offer a comprehensible and straightforward journey through the intricate realm of web security. These carefully designed learning courses

Military Drones Revolution

Military Drones: New Mobile Command Centers

The Air Force Special Operations Command (AFSOC) is currently working on a pioneering project that aims to transform MQ-9 Reaper drones into mobile command centers to better manage smaller unmanned

Tech Partnership

US and Vietnam: The Next Tech Leaders?

The US and Vietnam have entered into a series of multi-billion-dollar business deals, marking a significant leap forward in their cooperation in vital sectors like artificial intelligence (AI), semiconductors, and

Huge Savings

Score Massive Savings on Portable Gaming

This week in tech bargains, a well-known firm has considerably reduced the price of its portable gaming device, cutting costs by as much as 20 percent, which matches the lowest

Cloudfare Protection

Unbreakable: Cloudflare One Data Protection Suite

Recently, Cloudflare introduced its One Data Protection Suite, an extensive collection of sophisticated security tools designed to protect data in various environments, including web, private, and SaaS applications. The suite

Drone Revolution

Cool Drone Tech Unveiled at London Event

At the DSEI defense event in London, Israeli defense firms exhibited cutting-edge drone technology featuring vertical-takeoff-and-landing (VTOL) abilities while launching two innovative systems that have already been acquired by clients.

2D Semiconductor Revolution

Disrupting Electronics with 2D Semiconductors

The rapid development in electronic devices has created an increasing demand for advanced semiconductors. While silicon has traditionally been the go-to material for such applications, it suffers from certain limitations.

Cisco Growth

Cisco Cuts Jobs To Optimize Growth

Tech giant Cisco Systems Inc. recently unveiled plans to reduce its workforce in two Californian cities, with the goal of optimizing the company’s cost structure. The company has decided to

FAA Authorization

FAA Approves Drone Deliveries

In a significant development for the US drone industry, drone delivery company Zipline has gained Federal Aviation Administration (FAA) authorization, permitting them to operate drones beyond the visual line of

Mortgage Rate Challenges

Prop-Tech Firms Face Mortgage Rate Challenges

The surge in mortgage rates and a subsequent decrease in home buying have presented challenges for prop-tech firms like Divvy Homes, a rent-to-own start-up company. With a previous valuation of

Lighthouse Updates

Microsoft 365 Lighthouse: Powerful Updates

Microsoft has introduced a new update to Microsoft 365 Lighthouse, which includes support for alerts and notifications. This update is designed to give Managed Service Providers (MSPs) increased control and

Website Lock

Mysterious Website Blockage Sparks Concern

Recently, visitors of a well-known resource website encountered a message blocking their access, resulting in disappointment and frustration among its users. While the reason for this limitation remains uncertain, specialists

AI Tool

Unleashing AI Power with Microsoft 365 Copilot

Microsoft has recently unveiled the initial list of Australian clients who will benefit from Microsoft 365 (M365) Copilot through the exclusive invitation-only global Early Access Program. Prominent organizations participating in

Microsoft Egnyte Collaboration

Microsoft and Egnyte Collaboration

Microsoft has revealed a collaboration with Egnyte, a prominent platform for content cooperation and governance, with the goal of improving real-time collaboration features within Microsoft 365 and Microsoft Teams. This

Best Laptops

Top Programming Laptops of 2023

In 2023, many developers prioritize finding the best laptop for programming, whether at home, in the workplace, or on the go. A high-performing, portable, and user-friendly laptop could significantly influence

Renaissance Gaming Magic

AI Unleashes A Gaming Renaissance

In recent times, artificial intelligence has achieved remarkable progress, with resources like ChatGPT becoming more sophisticated and readily available. Pietro Schirano, the design lead at Brex, has explored the capabilities

New Apple Watch

The New Apple Watch Ultra 2 is Awesome

Apple is making waves in the smartwatch market with the introduction of the highly anticipated Apple Watch Ultra 2. This revolutionary device promises exceptional performance, robust design, and a myriad

Truth Unveiling

Unveiling Truths in Bowen’s SMR Controversy

Tony Wood from the Grattan Institute has voiced his concerns over Climate and Energy Minister Chris Bowen’s critique of the Coalition’s support for small modular nuclear reactors (SMRs). Wood points

Avoiding Crisis

Racing to Defy Looming Financial Crisis

Chinese property developer Country Garden is facing a liquidity challenge as it approaches a deadline to pay $15 million in interest associated with an offshore bond. With a 30-day grace

Open-Source Development

Open-Source Software Development is King

The increasingly digital world has led to the emergence of open-source software as a critical factor in modern software development, with more than 70% of the infrastructure, products, and services

Home Savings

Sensational Savings on Smart Home Security

For a limited time only, Amazon is offering massive discounts on a variety of intelligent home devices, including products from its Ring security range. Running until October 2 or while

Apple Unleashed

A Deep Dive into the iPhone 15 Pro Max

Apple recently unveiled its groundbreaking iPhone 15 Pro and iPhone 15 Pro Max models, featuring a revolutionary design, extraordinary display technology, and unrivaled performance. These new models are the first