devxlogo

Meet Jena, a Semantic Web Platform for Java

Meet Jena, a Semantic Web Platform for Java

ava programmers who want to develop semantic web applications have a growing range of tools and libraries to choose from. One such tool, the Jena platform, is an open source toolkit for processing Resource Description Framework (RDF), Web Ontology Language (OWL), and other semantic web data. Specifically, this discussion will introduce you to Jena’s Model abstraction that provides the container interface for collections of RDF triples, which are data linked by relationships.

Model is one of the key components of Jena’s approach to handling RDF data. You’ll explore its core capabilities along with some of the extensions of the basic Model that are built in to Jena to give you a working knowledge of Jena code that will load, process, query, and write RDF data and ontologies.

Jena is a free, open source (under a liberal BSD license) Java platform for processing semantic web data. In this case semantic web particularly refers to the approach based on the World Wide Web Consortium (W3C) Semantic Web standards, especially RDF, OWL, and SPARQL. Note that W3C strictly produces recommendations rather than standards, but the nuances of that difference are beyond the scope of this discussion.

One of Jena’s original goals was to support the W3C standards as faithfully as possible, and that principle remains one of the platform’s key values today. Jena grew out of a research activity at HP Labs, during the period when the current releases of RDF and OWL were being standardized. The Jena 2 release series began in 2003; the latest version at the time of this writing is Jena 2.5.3. Jena has been actively maintained and developed since then by the team at HP Labs and contributors from the community.

The heart of Jena is a Java library for semantic web data handling. The Jena SourceForge site, however, lists a number of other related tools and APIs for assisting developers to build and manage semantic web applications.

RDF Triples and Graphs
Confucius is said to have written that a journey of a thousand miles begins with a single step. With RDF, that single step is the triple. In essence, a triple is two pieces of data that are linked by a named relationship. For example:

Lisa Gerrard performs-track "Sacrifice"

Logically, a triple is a simple statement about the truth of some proposition?in this case that the binary predicate performs-track is true of the arguments Lisa Gerrard and Sacrifice. (Note that some details in this example were omitted for clarity; in fact, the track “Sacrifice” is performed by Lisa Gerrard and Pieter Bourke on the album Duality.) RDF calls the first of these arguments the triple’s subject, and the second is its object.

So far so good. But what are the arguments exactly? RDF distinguishes two kinds of elements that can appear in triples: literals and resources. A literal is just a piece of data: an integer, a string, a floating-point number, or even an XML structure. A resource identifies something (or someone), about which we make semantically meaningful statements. That “something” might be a report, a stock trade, or, in this example, a recording artist.

Putting Resources in Context
In RDF a named resource has a URI as an identifier. The semantic web operates in open, web-scale systems by design. It’s clearly important to avoid accidentally naming resources using the same identifier that someone else has used for a different concept. However, it should be possible to be able to make statements independently about the same object in different source contexts. While a music publisher may use a resource to list an artist’s recording catalog, another user might use the same URI to review recordings, or publicize an upcoming concert. This open linkability is a central value for semantic web applications.

For similar reasons, RDF also uses URIs to denote predicate names. Since URIs are usually quite long strings, RDF borrows a technique from XML to abbreviate them as namespace:name pairs (for example, foaf:Person), which makes it easier to write down examples compactly.

A literal can only be the object of a subject-predicate-object triple, but a resource can be either a subject or an object. In fact, the same resource can be a subject or an object (or even both) for any number of triples. Drawing out this scenario visually produces a directed, labeled graph, and collections of RDF statements are typically called graphs. Figure 1 demonstrates a simple example of a graph that uses data from the Friend-of-a-Friend (FOAF) vocabulary.


Figure 1. FOAF Graph: A labeled graph is a collection of RDF statements.

Each arc or edge in the graph represents an RDF statement. The graph shown in Figure 1 contains seven triples in total, and of the seven nodes in the graph four are literals and three are resources. Only one of the resources, the one denoting the Person type (or, in RDF parlance, class), is labeled with a URI. The other two resources are anonymous nodes, which are sometimes called bNodes.

Tradition dictates that the first tutorial program is some variant of HelloWorld. Here, HelloWorld.java, available for download, performs three simple tasks: reading in a FOAF RDF document similar to the one shown in Figure 1, counting the triples, and writing it out again in a different format.

The input file is encoded in an XML dialect used to represent RDF. Despite being part of the RDF standard, RDF/XML is widely held to be a human-unfriendly syntax compared to other XML dialects because the serialization needs to encode the subject-predicate-object structure of RDF and is intended to be consumed by a computer, not a human. N3, on the other hand, is a textual format that is rather more compact and human friendly. The HelloWorld program will consume RDF/XML and produce N3:

public class HelloWorld{  public static final String FOAF_FILE =    "http://www.hpl.hp.com/people/Ian_Dickinson/foaf.rdf";    private static Logger log = Logger.getLogger(       HelloWorld.class );    public static void main( String[] args ) {      // create the simplest model there is      Model m = ModelFactory.createDefaultModel();      // use the file manager to read an RDF document into the model      FileManager.get().readModel( m, FOAF_FILE );      log.debug( "We have loaded a model with no. statements = " + m.size() );      // write the model out in a different format      m.write( System.out, "N3" );  }}

Jena’s Model Abstraction
Jena represents an RDF graph as an instance of the class Model. Abstractly, a Model holds a set of Statement objects, each of which is one RDF triple. Of course, the storage schemes vary for different types of Model, and these schemes try to be both compact and efficient to process under a range of usages.

Model itself is a Java interface, not a class, which allows different types of triple store to present the same interface to user code, as you’ll see in upcoming examples. The consequence, though, is that client code can’t construct a Model directly. Instead, application code invokes the ModelFactory. The simplest kind of model, which uses an in-memory storage model and has no inference or any other cleverness, is created by ModelFactory.createDefaultModel().Looking at the Javadoc, it’s quickly clear that Model has a daunting number of methods. However, they group into a small number of operation types:

  • I/O operations for reading and writing RDF documents in a variety of syntaxes
  • Query operations for listing the resources and statements contained in a model
  • Update operations for creating and removing statements
  • Support for RDF’s reification capability (making statements about statements)
  • Utilities, such as transaction support, critical section locking, and so on

One class of operations that’s not in the Model API is in-place updating of resources, statements, and literals. There’s no setURI() in the Resource API. All of Jena’s key abstractions are immutable, and update can be achieved only by removing the old and adding the new.

This next example shows a few of the Model API methods in action. The objective this time is to list every resource of rdf:type FOAF Person and then for each of those show the person’s name, if it is known:

protected void run() {  // use the file manager to create a plain model holding the sample FOAF file  Model m = FileManager.get().loadModel( HelloWorld.FOAF_FILE );    listPeople( m );}/** Print out all named resources of type foaf:Person */protected void listPeople( Model m ) {  // get all resources of type foaf:Person  Resource personClass = m.getResource( FOAF_NS + "Person" );  ResIterator i = m.listSubjectsWithProperty( RDF.type, personClass );  // for each person, show their foaf:name if known  Property name = m.getProperty( FOAF_NS + "name" );  Property firstName = m.getProperty( FOAF_NS + "firstName" );  while (i.hasNext()) {        Resource person = i.nextResource();    Statement nm = person.getProperty( name );    nm = (nm == null) ? person.getProperty( firstName ) : nm;    if (nm != null) {      System.out.println( "Person named: " + nm.getString() );    }  }}

Arguably, accessing the Model API this way provides a fairly low-level view of the RDF graph. Other query techniques, such as the SPARQL query language, provide a more compact and powerful query notation, but that’s material for a future article.

In principle, there’s no difference in accessing a simple memory model or any of the more sophisticated model variants. Some of these variants will be explored shortly, but it’s worth remembering that whether the model is an unadorned, in-memory data structure; an inference engine; or a view of a relational database, the core API remains the same. This consistency underpins some useful modularity in Jena. The SPARQL query engine, for example, runs queries against any model; to perform SPARQL queries over the logical entailments of an ontology, simply pass the SPARQL engine an inference-backed Model.

File-Backed and Database Models
The Model API has operations to read a document from a URL (including a file URL), and to write it again to an output stream. Serializing a model to and from a file is straightforward. This serialization, however, means that ensuring a model is saved is the client’s responsibility. You could argue that doing so undermines a clean separation of concerns. Jena provides a simple persistence solution for models that save their state in a file. This next code example shows how to create models that automatically save their state when closed. Each time the program runs, a new timestamp value is added to the persistent model state:

public static final String DATA_DIR = "./data";public static final String NS = "http://devx.com/examples/jena#";public static final String EX3 = "example3";public void run() {  Model m = getModel();  // add a new timestamp  Literal now = m.createTypedLiteral( Calendar.getInstance() );  m.getResource( NS + EX3 ).addProperty( DC.date, now );  // as a test, write the model out in Turtle format  m.write( System.out, "N3" );  // closing the model is the cue to save the persistent state  m.close();}/** Answer a model backed by a persistent file store */protected Model getModel() {  ModelMaker maker = ModelFactory.createFileModelMaker( DATA_DIR );  // get the existing model with that name if we can  Model m = maker.getModel( EX3 );  // if we didn't find it, create a new one  if (m == null) {    m = maker.createModel( EX3 );    m.withDefaultMappings( PrefixMapping.Standard );  }  return m;}

Saving a model to a file is a very lightweight form of persistence, but it can be very useful in some situations. More often, however, when the design calls for a persistent model it’s because the application has very large volumes of data, or requires transactional support. Then a database is needed usually.

Jena ships with support for a wide range of standard database engines, including MySQL, PostgresQL, SQL Server, Oracle, and Derby. “Support” here means that Jena’s database adapters use standard JDBC drivers to manage those database engines as triple stores. In particular, it means that Jena will create and manage its own table layout in the database, rather than use existing tables. There are ways of treating normal relational tables as triple sources, but the core relational model support in Jena only manages Jena-specific tables. These tables can store any number of models, as long as each model has a distinct name.

The Jena adapters support transactions if the JDBC driver supports them, and they hide the variations of SQL syntax in the different databases from the Model interface. When a Jena application calls, for example, listStatements() on a database-backed model, Jena will construct the appropriate SQL query, execute it against the database engine, and translate the ResultSet into an iterator over Statement objects.

There is no requirement for a database to be pre-initialized. If a Jena RDBModel is connected to a database that doesn’t have the Jena-specific table layout, Jena will auto-initialize the database tables there and then. Of course, in a production environment it might be advisable to include an explicit initialization step in the automated build/test/deploy scripts as fits the organization’s local policies.

As a basic illustration, here is a Jena RDBModel connected to a MySQL database:

@Overridepublic void run() {  ModelMaker maker = getModelMaker();  Model m;  if (!maker.hasModel( MODEL_NAME )) {    // we have not loaded this model yet; do so now and read some content    System.out.println( "Loading model content - one time only" );    m = maker.createModel( MODEL_NAME );    FileManager.get().readModel( m, HelloWorld.FOAF_FILE );    }    else {      m = maker.getModel( MODEL_NAME );    }  listPeople( m );}/** Answer a ModelMaker for connecting to RDB models */protected ModelMaker getModelMaker() {  try {    Class.forName( "com.mysql.jdbc.Driver" );  }  catch (Exception e) {e.printStackTrace();}  IDBConnection conn  = new DBConnection( DB_URL, DB_USER, DB_PW, DB_TYPE );  return ModelFactory.createModelRDBMaker( conn );}

Ontology Models
The examples up to this point have looked at RDF models, concentrating on the simple binary predicates and triples that represent semi-structured data in graphs. The W3C Semantic Web standards define two other languages, RDF Schema (RDFS) and OWL, which extend RDF and add considerably more power for describing and modeling information and applications. These languages provide a basis for developing ontologies. Ontologies describe what is true in principle about the domain applications being worked on.

For example, consider that a given person resource may have a name property. RDF can represent that individual and that individual’s name, but RDF by itself can’t state in principle that it’s true all persons have names, or that any resource that has a social security number must therefore represent a person. Ontologies provide a formal (that is, mathematical) description of the concepts in a given application domain, and they can represent things that are true in principle.

Using and applying ontologies is really a subject for another article. Continuing the theme of this discussion, you should observe that the W3C ontology languages?RDFS and OWL?connect to RDF in two ways. First, they make semantic statements about RDF instance data. Second, they also use RDF to represent the ontology itself. This idea is powerful (and sometimes confusing). Concept descriptions represent what the application can state about RDF instance data. These concept descriptions are themselves written down in RDF, using a special set of reserved resource and property URIs. The concept description, called a class in ontology languages (and not to be confused with a class in OO programming languages), can be expressed by a set of special triples. Note that it really is a set of triples: there’s more to say than can be expressed by just one RDF statement.

The essential simplicity of RDF’s triple-centric representation can lead to a can’t-see-the-woods-for-all-the-trees problem with ontologies. Class descriptions are confusing to manipulate triple by triple, and it’s easy to introduce errors. Jena’s ontology model API tries to help by extending Model, Resource, and so on with Java classes that encapsulate a more abstract view of what’s really being expressed. Instead of seeing the raw triples attached to a resource representing an ontology class, the Java abstraction OntClass provides a convenient way of processing those triples while keeping them out of the programmer’s way. As just one example, in RDF you can search for triples with the predicate rdfs:subClassOf, from which the subclass relationships between class resources can be extracted. Or, through OntClass the super- and subclasses can be listed directly with OntClass.listSuperClasses() and .listSubClasses(), respectively.

Inference (Reasoning) Models
The real power of formal ontologies comes when combined with reasoning algorithms to infer things that are true about some collection of data, but not written down explicitly?and perhaps not even obvious. There are many approaches to reasoning, but Jena uses a particular pattern for all of the inference procedures it supports. In essence, Jena views a reasoner as an automated way to add more triples?entailments?to a base model. For example, suppose an application has only these two facts:

project:member_10 foaf:name "Julie".foaf:name rdfs:domain foaf:Person.

There are two triples here. One is that a resource with the URI project:member_10 has the FOAF name Julie, and the other is that the FOAF name is a property with a domain class of FOAF Person. Unlike a schema or constraint language, the fact that member_10 is not known to be a Person doesn’t imply a violation. In RDFS and OWL it means that you are entitled to infer, or entail, the statement that member_10 is, in fact, a Person.

This triple is therefore entailed from the two previous triples (together with a semantic rule about the meaning of a property’s domain class):

project:member_10 rdf:type foaf:Person.

In a Jena inference model only the base statements are asserted, but when inspected the model appears also to contain the entailments?just as though those triples had also been asserted.

A final example shows Jena’s inference processing in action. Since reasoners and ontologies are commonly used together, this example shows both OntModel and one of Jena’s built-in OWL reasoners working in collaboration:

@Overrideprotected void run() {  // create an OntModel that also handles OWL reasoning through the rule engine  OntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_MICRO_RULE_INF );  // read in the FOAF ontology  FileManager.get().readModel( m, FOAF_NS );  // make a claim about project member Julie  Resource m10 = m.createResource( NS + "member_10" );  Property name = m.getProperty( FOAF_NS + "firstName" );  m10.addProperty( name, "Julie" );  // now list the named foaf:Persons - we'll only see Julie if we can tell  // that m10 has rdf:type foaf:Person, which is entailed implicitly  listPeople( m );}

This example uses one of the built-in Jena reasoners, based on a Jena-native rule engine. However, the inference model support in Jena works with a variety of other reasoners, including external description logic reasoners such as Pellet.

This introduction to Jena’s Model abstraction has covered some of the core operations in Jena, and touched on some of the key variants of Model that are part of the Jena framework. Other capabilities, which may be the subject of future articles include:

  • Jena’s built-in RDB Model adapters work with a specific triple store table layout, but there are other tools that extend Model to cover repositories other than triple stores, such as native relational tables or LDAP servers.
  • The examples discussed here created models programmatically, but it’s also possible to describe models using a declarative vocabulary (in RDF, naturally) and have this description assembled into a Jena Model object.
  • Jena’s schemagen tool can automate the translation of ontology terms into Java constants that can be used by Java programs to access RDF and OWL data.

RDF is a simple, flexible, and extensible representation for semi-structured data, and is a foundational technology for the semantic web. Jena is a well-established, open source Java platform for creating, manipulating, and handling RDF data. In Jena, RDF graphs are represented as Model objects, and triples are represented as Statement objects. The model abstraction is the basis for some powerful extensions, including transparent support for databases and inference, and a convenient API for processing ontologies.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist