RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Java/JRuby Developers, Say Open 'Sesame' to the Semantic Web : Page 4

The semantic web enables you to use information from disparate sources in different formats/schemas without having to convert the data to a standard format. Get an introduction to semantic web application development using Java and JRuby.

Matching Data Using Regular Expressions
With some loss of efficiency, you can use regular expression matching in SPARQL queries. As an example of this technique, Listing 6 uses the RDF file rdf_files/oil_example.n3. The Listing 6 SPARQL query seeks to find all RDF triples that contain the word "oil" in the object field, where the predicate field of the triple is equal to kb:containsIndustryTerm. To apply regular expression matching, use one of the previous JRuby example programs (see Listing 2) and change the name of the N3 file loaded and the SPARQL query string to this:

sparql_query =
  "PREFIX kb:  
   SELECT ?subject ?object
   WHERE { ?subject kb:containsIndustryTerm ?object FILTER regex(?object, \"oil\") . }"

Here, I added a filter term after ?object that restricts ?object values to strings containing "oil." Two RDF triples match, so two lines (each with the article URL and the object value) get printed out:

  [http://news.yahoo.com/s/nm/20080616/ts_nm/usa_flooding_dc_16/, oil]
  [http://news.yahoo.com/s/nm/20080616/ts_nm/usa_politics_dc_2/, oil prices]

Now, consider a similar but more interesting example: augmenting the regular expression example to find all triples for matched articles. Given the article URLs that were found in the previous example, you can collect a set of all RDF triples with subjects equal to any of the matched article URLs by changing the SPARQL query string to this:

sparql_query =
"PREFIX kb:  
SELECT  ?subject ?predicate ?object2
    ?subject kb:containsIndustryTerm ?object FILTER regex(?object, \"oil\") .
    ?subject ?predicate ?object2 .

This query has two WHERE clauses: The first matches all triples with a predicate term equal to kb:containsIndustryTerm, and the second matches all triples where the subject matches the first WHERE clause. Results will each contain three subject/predicate/object values.

Merging Data from Different Sources That Use Different RDF Schemas
Your semantic web application will need to use data from different sources, and the following example shows you how to implement that functionality. In addition to using the rdf_files/news.n3 file from the previous examples, this example will also use rdf_files/news_2.n3, which uses a very different schema:

@prefix ex:  <http://example.com/ontology#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .

kb:about "Academy Award red carpet got wet in the rain" ; ex:author "Joy Smith" ; ex:location "United States" , "Los Angeles" ; kb:keyword "entertainment" , "movies" . <http://news.yahoo.com/s/made_up_data/made_up_article_2/>
kb:about "Oil prices rise" ; ex:author "Sam Suvy" ; ex:location "United States" , "Chicago" ; kb:keyword "cars" , "fuel", "oil" .

Looking at this new RDF file, you will see some similarities with the previous example's RDF file news.n3:

  • The news_2.n3 file uses a property location that is similar to the properties in news.n3: containsCity, containsCountry, and containsState. These properties are defined in different namespaces, but that is not a problem (more on this shortly).
  • The news_2.n3 file uses a property keyword that is similar to the property containsIndustryTerm in news.n3. It might make sense to perform fuzzy matches between keyword object values and containsIndustryTerm object values.

The issue of handling locations can be solved by simply adding another property statement:

ex:location rdfs:subPropertyOf kb:containsPlace .

Now any SPARQL queries run against kb:containsPlace without your having to modify any data. For the second similarity in both information sources having lists of keywords or industry standard terms, you can add another statement:

ex:keyword rdfs:subPropertyOf kb:containsIndustryTerm .

I prefer using my own knowledgebooks.com namespace in SPARQL queries, but if I wanted to use the ex:keyword property, I could have just reversed the subject and object in this RDF statement.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date