Modeling with RDF, RDFS
RDFS supports the definition of classes and properties based on set inclusion. (Be aware that classes and properties in RDFS are orthogonal.) The example in this article does not simply use properties to define data attributes for classes, which is different from object modeling and the procedure used by object-oriented programming languages such as Java, Ruby, and Smalltalk. In addition to facilitating the combination of different data sources, you can use RDFS inferencing to effectively generate new RDFthat is, inferencing asserts new RDF triples.
Let's get started with example RDF data from news articles and then look at example programs that show some basic techniques for building semantic web applications.
Person, Place, and Industry Terms Stored in RDF
Using the Reuters OpenCalais system, I wrote a simple Ruby script ruby_utilities/raw_data_to_rdf.rb that reads text files containing news stories and generates RDF data in N-Triples. This format is composed of a subject, a predicate, an object, and a period.
The downloadable code for this article contains a sample output RDF N-Triples file called rdf_files/news.nt. Listing 1 contains a few lines from that file. You can see triple elements being defined either in specific name spaces or as string literals. The predicate containsIndustryTerm is defined in the namespace of the knowledgebooks.com domain: <http://knowledgebooks.com/ontology/#containsIndustryTerm>. Namespaces can be abbreviated using a prefix notation that you will use later when you switch to the N3 RDF format.
So where is RDF data actually stored? Sesame supports an in-memory RDF store, which the Sesame wrapper in the downloadable code uses, as well as several different back-end data store mechanisms. Although these alternative storage back ends can be selected with just a few lines of code (see the Sesame web site for documentation), configuring them is time consuming. I suggest learning the basics of RDF/RDFS modeling and effective SPARQL use and not worrying too much about deployment until you have an interesting application to deploy.
Querying N-Triple RDF Data Using SPARQL
This section shows complete JRuby and Java examples that query N-Triple RDF data using SPARQL. The sections to follow will use just code snippets. Java and/or Ruby programmers should easily make sense of the code in the examples and have few problems using derivative code in their own programs. All the examples use my Sesame wrapper library, which is much simpler than calling the Sesame APIs directly. You eventually may want to use the full Sesame APIs.
Listing 2 is a complete listing of the JRuby example file jruby_sesame_example2.rb. The class TripleStoreSesameManager in Listing 2 is defined in the wrapper library. The method doSparqlQuery requires two arguments: a string containing a valid SPARQL query and an instance of any Ruby class that defines the method processResult. If you have a syntax error in your SPARQL query, the Sesame library will print useful error messages.
You will find some similarity between SPARQL and SQL. The SELECT statement specifies one, two, or three of the triple terms that should be returned with each query result. Here, I wanted to see only the subject and object because the predicate triple term is defined in the WHERE clause to match <http:://knowledgebooks.com/ontology#containsCompany> exactly.
JRuby is a good language for working with Sesame because it is dynamically typed and is very terse. Also, the ability to work interactively in an irb console is a big win. Overall, coding experiments are simpler with JRuby than with Java. That said, once I use a dynamic language like JRuby for code experiments, I usually use Java for production work. Listing 3 shows a similar example to Listing 2 using Java and my Sesame wrapper library.
Because Java is strongly typed, the second argument to the method doSparqlQuery in Listing 2 is defined using the interface ISparqlResultHandler, which defines the method signature for processResult.
The remainder of this article concentrates on N3, a better RDF format, for using data from different sources that use different schemas. It also offers more advanced SPARQL examples.