Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Java/JRuby Developers, Say Open 'Sesame' to the Semantic Web : Page 3

The semantic web enables you to use information from disparate sources in different formats/schemas without having to convert the data to a standard format. Get an introduction to semantic web application development using Java and JRuby.


advertisement
A Better RDF Format: N3 Syntax
Frameworks like Sesame that provide RDF storage and querying functionality do not care which RDF format you use for input. However, it is wise to use the easiest to read and understand format. For that reason, I use N3 whenever I can. Many years ago, I started working with RDF using its XML serialization format, which I found confusing and generally counterintuitive. N-Triple, Turtle (a simpler form of N3, which this article doesn't discuss), and N3 are all vastly superior to the RDF XML format.

My wrapper library has an API for calling the Sesame utility code for converting whatever RDF data is in its RDF store to N3. Here is a Java code snippet that shows you how to do this:

public class ConvertTriplesToN3 { public static void main(String [] args) throws RepositoryException, IOException, RDFParseException,
RDFHandlerException { TripleStoreSesameManager ts = new TripleStoreSesameManager(); ts.loadRDF("rdf_files/rdfs.nt"); ts.loadRDF("rdf_files/news.nt"); ts.saveRepositoryAsN3("sample_N3.n3"); } }



Unfortunately, Sesame writes out N3 data without using namespace abbreviations. Listing 4 shows a few lines produced by the above code snippet. You also can use utility programs such as CWM to convert different RDF formats.

N3 allows you to collapse many N-Triple RDF statements (again, subject, predicate, object, ".") into a single N3 statement for N-Triple statements with the same subject. Let's look at an N3 fragment in more detail (assume that the kb: and rdfs: namespace prefixes are defined):

<http://news.yahoo.com/s/nm/20080616/ts_nm/usa_flooding_dc_16 /> kb:containsCity "Burlington" , "Denver" ; kb:containsRegion "U.S. Midwest" , "Midwest" ; kb:containsCountry "United States" , "Japan".

Here, the subject is the complete URL for the news article on the web. This article has two objects "Burlington" and "Denver" for the predicate kb:containsCity, multiple objects separated by commas, and the last object is followed by a semicolon, which indicates that the next term will start a new predicate that is followed by one or more objects. Notice that the last line is terminated with a period; that also terminates this N3 statement.

I hand-edited the file rdf_files/news.n3 (very easily with regular expression search and replace) to add namespace abbreviations to the automatically converted N3 file. Listing 5 shows the first few lines of the file news.n3. The first two lines define namespace abbreviations (or "prefixes"). As an example, the abbreviation "rdfs" for RDF Schema and "kb" for my own knowledgebooks.com namespace are used to define a new RDFS property kb:containsPlace, which is a super property of kb:containsCity, kb:containsCountry, and kb:containsState. Note that in this example I did not make kb:containsCity a sub-property of kb:containsPlace. Using namespace abbreviations makes it a lot easier to read RDF.

So how can you use the new super property kb:containsPlace? No triples in the original triple store had a predicate equal to kb:containsPlace; this property is used to assert new triples using RDFS inferencing. Some RDF triple stores pre-calculate asserted triples, while others calculate them as needed during SPARQL query processing. As a semantic web developer, it makes no conceptual difference how the triple store works internally, but you likely will face memory-use versus querying-performance tradeoffs.

As an example of RDFS inferencing, suppose that you have one application that runs fine-grained queries for a news article containing a specific state and another application that searches for all news stories that contain any references to physical locations. The first application could query matching kb:containsState and triple objects against a string literal for the state name (or you might use 50 URIs to represent states). The second application can use the super property in a SPARQL query like this:

sparql_query = "PREFIX kb: <http://knowlegebooks.com/ontology/#>
SELECT ?subject ?object WHERE { ?subject kb:containsPlace ?object . }";

This query matches all articles with a predicate equal to containsRegion, containsCountry, or containsState. Notice the SPARQL syntax for using namespace abbreviations (or prefixes) using the PREFIX keyword.

Author's Note: There is no difference performing SPARQL queries against different RDF formats. An RDF storage repository like Sesame stores RDF in an efficient internal format. Developers may have a tendency to think of formats as XML RDF or N3 RDF, but once data has been read into a repository, it does not matter which original RDF format was used. It is also important to remember that a single N3 statement generally will define many RDF triples (all with the same subject).

By using RDFS (in this case, defining the super property containsPlace), you can change the way you access RDF data without converting it. In a relational database application, you would need to use either special queries (that would have to change if you wanted to add a new sub property to containsPlace) or new tables or database views. Yes, a relational database solves this reuse problem also, but with much less flexibility than RDFS.

When you have multiple data sources using different schemas/formats, then RDF with RDFS provides even more flexibility, as you will soon see.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap