Create Scalable Semantic Applications with Database-Backed RDF Stores

hope you’ve been following the growing interest in semantic technology and reading the articles on DevX’s Semantic Zone, and feel ready to try out semantic technologies in your own applications. If so, you might have obtained a moderately sized RDF dataset from your domain and eagerly cranked out some simple code such as the following, which loads the freely available National Cancer Institute’s (NCI) cancer ontology using the Jena API:

   Model model =       ModelFactory.createDefaultModel();   model.read(getClass().getResourceAsStream(      "/nciOncology.owl"),      "http://www.mindswap.org/2003/nciOncology.owl#");

NCI’s cancer ontology file is only about 32 MB in size, so you might be a little startled after a few seconds when the Java VM throws an error:

java.lang.OutOfMemoryError: Java heap space

The small size of the ontology file belies the fact that it contains almost half a million statements. Yes, you could start tuning your VM settings so Java could manage the ontology completely in memory—but that approach won’t scale as you work with more and bigger ontologies. In addition, the significant start-up time involved in parsing so many statements leaves a lot to be desired.

A more practical approach is to store larger ontologies in a database-backed RDF store (referred to here as an RDF database for brevity), and then query the model much as you would a database. This article shows you how to do just that, with the goal of pointing out several technologies that you can use independently or in combination to accomplish such queries. The article will then take a deeper dive using one specific option, providing more detailed examples that show you how to use the Jena API to manage and query Jena-based RDF databases. If you’re not already familiar with concepts and tools such as RDF, SPARQL, and Jena, the resources you’ll find at the end of this article should help.

To illustrate Jena’s support for working with database-backed RDF stores, you’ll see how to create a graphical management tool for Jena RDF databases and models, the “Jena Database Model Manager,” which is simply an SWT-based wrapper for the Jena APIs.

Author’s Note: I won’t explore the SWT code here, but you can find it in the downloadable source code that accompanies this article. Figures 1, 2, and 3 show the Jena Database Model Manager in action.

 
Figure 1. Jena Database Connection Management: Users create named connections to Jena databases to which they can then connect and disconnect.
 
Figure 2. Jena Model Management: The tool provides a graphical approach toward creating, deleting, and loading Jena Models.
 
Figure 3. Querying a Jena RDF Database: The Query tab allows SPARQL queries to be executed against one or more Jena models.

RDF Store Overview
You have several options to consider when choosing an RDF database for your application. AllegroGraph is a commercial option that boasts impressive performance as well as other value-added features such as a built-in reasoner and support for federated databases. Three popular open source options are Sesame, Jena, and Mulgara.

Jena’s traditional database layout is RDB and was optimized for the Jena Model API. A new Jena component, SDB, is being developed to offer an alternative database layout optimized for larger patterns such as those you would typically execute when performing SPARQL queries. SDB is currently in beta so Jena references in this article refer to the RDB database layout unless otherwise noted.

The aspects you will want to consider when selecting an RDF database are:

  • Database compatibility
  • API compatibility
  • Load and query performance
  • Tool support
  • Inferencing support

AllegroGraph is itself a database and as such doesn’t rely on or truly integrate with traditional relational databases. It does, however, offer the ability to back up its operations to a relational database. Sesame and Jena, on the other hand, are not databases; they’re toolkits for working with RDF data. Both have the ability to either sit on top of popular databases (see Table 1) or use file-based and memory-based modes.

Table 1. Database Compatibility: The table shows the relational databases supported by four popular RDF databases.

 

MySQL

PostgreSQL

Oracle

SQL Server

Derby

HSQLDB

Jena Yes Yes Yes Yes Yes Yes
Sesame Yes Yes Yes No No No
AllegroGraph NA NA NA (backup) NA NA NA
Mulgara NA NA NA NA NA NA

Sesame and Jena both allow you to store RDF triples. The schema that each uses to store the triples is proprietary, but each exposes an API to manage and query the stored RDF data. You can access AllegroGraph repositories through both the Sesame and Jena APIs.

Regardless of which RDF database you choose, you can access that store through either the Jena API or the Sesame API. The Jena Sesame Model project allows developers to access Sesame databases through Jena’s model abstraction. Conversely, the Sesame-Jena Adapter project provides access to Jena models through the Sesame API. Although you can use either, you will generally be better off using the Jena API to access Jena databases and the Sesame API to access Sesame databases. You may want to factor in this affinity when deciding what set of trade-offs to make when selecting an RDF database (see Table 2).

Table 2. API Compatibility: You can access all the RDF datasets analyzed here through both the Jena and Sesame API.
  Jena API Sesame API
Jena Yes Yes (via Sesame-Jena Adapter)
Sesame Yes (via Jena Sesame Model) Yes
AllegroGraph Yes (via AllegroGraph interfaces) Yes (via AllegroGraph interfaces)
Mulgara Yes (also exposes its own JRDF API) No

Obviously, load and query performance are among the biggest factors affecting any RDF database selection. Performance benchmarking and tuning are always very contextual regardless of the technology being considered. I urge you to perform your own benchmarks in your own network and with your own hardware, datasets, and query types. Consult these links for performance benchmarks reported by the respective RDF database providers:

Tool support is another important consideration when choosing technologies; RDF databases provide different types and levels of tooling around the core function of managing RDF data. Sesame ships with graphical tools to manage a Sesame server, and supports load, query, and explore operations via a web interface. Although Jena offers only command-line management utilities, several related projects can help you manage Jena RDF databases:

  • Joseki lets you query RDF files and databases online.
  • Twinkle provides a GUI for executing SPARQL queries against RDF files.
  • TopBraid Composer is a powerful ontology editor that can access Jena, Sesame, and AllegroGraph RDF stores.

Table 3 shows some available tool support for the three RDF databases discussed here.

Table 3. Tool Support: These RDF databases provide a spectrum of tooling support—from command-line utilities to graphical UIs.
  Database Management Query Execution Editor Integration
Jena Command-line tools and Java API Joseki and Twinkle TopBraid
Sesame GUI and Web Interface Web Interface TopBraid
AllegroGraph Java, HTTP, Lisp Java, HTTP, Lisp TopBraid
Mulgara Java, Perl, iTQL Java, Perl, iTQL NA

Another potentially important consideration when evaluating RDF databases is the query languages they support. All the popular RDF databases explored here offer a proprietary query language into RDF data, but not all offer support for SPARQL, an emerging standard RDF query language. Table 4 highlights the differences in support for RDF query languages among the various tools:

Table 4. RDF Query Language Support: One distinguishing characteristic of RDF databases lies in their support for SPARQL.
  Native RDF Query Language SPARQL Support
Jena RDQL Yes
Sesame RQL No
AllegroGraph SPARQL Yes
Mulgara iTQL No

Inferencing support is yet another important characteristic to consider when selecting an RDF database. Sesame and AllegroGraph notably provide optional inferencing front ends that can dynamically create entailments during database operations and can insert these additional entailments along with the asserted statements into the database. Jena features a robust and highly configurable inference engine, but at this time you can’t configure it as a front end to an RDF database. Fortunately, there’s a relatively simple workaround; you can create your own entailments using Jena’s inference engine and add those into your RDF database explicitly.

Managing Jena RDF Databases
Now that you’ve explored some of the general RDF database considerations and options, here’s a more in-depth look at using Jena. A recent DevX article described Jena well. The Model interface is a key abstraction in Jena. A Jena Model represents an RDF graph, which is composed of a set of statements, each of which is an RDF triple. There are several implementations of the Model interface, each of which works with a different type of model, such as in-memory, file based, inferencing, and database backed. This article focuses on the database-backed implementation.

 
Figure 4. Jena Model Abstraction: These simple classes provide an abstraction on top of the Jena API. They encapsulate access to the Jena APIs and are used by the SWT components to manipulate the underlying Jena data structures.

A Jena database can store multiple models, typically storing each model in its own set of tables in the database. The ModelRDB class implements database-backed models; you should create instances via the ModelFactory class. You can see how to accomplish this by exploring what happens when a user selects a named database connection and clicks the Connect button (illustrated in Figure 1) to connect to a Jena database. Although Jena will create the necessary tables to represent models in a database, you need to create the database itself ahead of time using the database’s native tools.

All the screen elements in the example Jena Model Manager application manipulate an abstraction of databases, connections, and Jena Models. These classes in turn wrap and manipulate the Jena API to perform the user’s actions. To distinguish the Jena Model interface from the Model Manager representation, this article uses the class name “AppModel” to refer to the application’s Model representation. Figure 4 illustrates this simple representation as a class diagram.

When a user selects a named database connection from the connection list and clicks the Connect button, a listener event fires that invokes the connect method of the selected Connection object. The Connection object, in turn, uses the Jena API to connect to a Jena database and verify the state of the connection as shown below:

   // from com/devx/tools/jena/manager/domain/Connection.java   import com.hp.hpl.jena.db.DBConnection;   import com.hp.hpl.jena.db.IDBConnection;      public Connection   {      private String driver;      private String url;      private String user;      private String password;      private String databaseType;              // The underlying Jena database connection      private IDBConnection conn;         ...         public void connect()      {         Class.forName(driver);         conn = new DBConnection(url, user, password, databaseType);         testConnection();      }         private void testConnection()      {         conn.getAllModelNames();      }         ...   }

As is typical in JDBC, the code loads the database driver into the classpath, so that subsequent requests to the DriverManager can construct instances of the specified driver. Next, it creates a Jena DBConnection object to represent the database connection parameters. Constructing a Jena DBConnection object simply creates a connection specification—it doesn’t attempt to connect to the underlying database. Invoking the getAllModelNames method on the connection forces the connection attempt to the database so that errors in the connection parameters can be uncovered early.

Initially, the database will contain no models when the user clicks the Models tab shown in Figure 2. When a user clicks the Create Model button a listener fires that prompts the user to enter a name for the new model, constructs a new AppModel object, and invokes its createModel method. This method in turn uses the Jena API to create an empty model in the database as shown below:

   // from com/devx/tools/jena/manager/domain/AppModel.java   import com.hp.hpl.jena.db.IDBConnection;   import com.hp.hpl.jena.db.ModelRDB;   import com.hp.hpl.jena.rdf.model.ModelFactory;   import com.hp.hpl.jena.rdf.model.ModelMaker;      public class AppModel   {      private final Database database;      private final String name;         public AppModel(Database database, String name)      {         this.database = database;         this.name = name;      }         public void createModel()      {         ModelRDB.createModel(getJenaConnection(), name);      }          private IDBConnection getJenaConnection()      {         return database.getConnection().getJenaConnection();      }      ...   }

Subsequently, when users select the Load link next to the newly created model, they see a prompt where they can specify an RDF file to load and a base URI for the data to be loaded. That information is then passed to the AppModel’s load method, which uses the Jena API to load the data into the underlying model as shown below. The load operation has been wrapped with a Jena transaction to improve performance and provide more reliable handling of error conditions:

   // from com/devx/tools/jena/manager/domain/AppModel.java   public class Model   {      ...      public void load(InputStream data, String baseUri)      {         ModelMaker maker = ModelFactory.createModelRDBMaker(            getJenaConnection());         Model model = maker.openModel(            name, false);            model.begin();         try         {            model.read(data, baseUri);            model.commit();         }         catch (Throwable e)         {            model.abort();            throw new RuntimeException(e);         }   }      ...   }

This persistent model is now populated with data and can be consumed just like any other Jena model through the Jena or ARQ (for SPARQL queries) APIs. The Query tab shown in Figure 3 lets users select one or more models and execute SPARQL queries against the selected models. After a user clicks the Execute button, the following code adds all the selected models to an application QueryExecutor object (see Listing 1) that nests the selected models as submodels under a composite parent, and then executes the query against the composite:

   // from com/devx/tools/jena/manager/views/QueryTab.java   public class QueryTab   {      ...      class ExecuteButtonListener extends SelectionAdapter      {         @Override         public void widgetSelected(SelectionEvent e)         {            resultsTable.removeAll();            Collection selectedModels =               Arrays.asList(modelsList.getSelection());               QueryExecutor executor = new QueryExecutor();            for (AppModel appModel : Application.database.listModels())            {               if (selectedModels.contains(appModel.getName()))               {                  executor.addModel(Application.database, appModel);               }            }               ResultSet results = executor.execute(queryTextBox.getText());            // Format results            ...         }      }      ...   }

There are other more efficient approaches for doing this that are beyond the scope of this article. But if you are looking for ways to further optimize complex queries across multiple models check out the com.hp.hpl.jena.graph.compose package in the Jena Javadocs.

As you can see the Jena API for managing RDF databases is very straightforward and easy to use. The guts of the Jena Model Manager application you have been exploring are very small and provide a nice example of how the Jena API can be used to manage Jena databases and models. And as an added benefit the running application available for download will hopefully provide more visibility and ease of use for managing Jena RDF databases.

Although this introduction to RDF databases has been relatively brief, you have seen a high-level overview of the various RDF-database tools and technologies available, as well as more specific examples that use the Jena API to create, load, and query Jena RDF databases. You can apply these same concepts to other RDF databases and APIs.

Author’s Note: Dave Reynolds and Andy Seaborne from the Jena team graciously contributed thoughts and suggestions to this article, and I am very thankful for their input.

Related Resources

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: