Java/JRuby Developers, Say Open ‘Sesame’ to the Semantic Web

Java/JRuby Developers, Say Open ‘Sesame’ to the Semantic Web

he core concept of the semantic web is integrating and using data from different sources. Using semantic web technologies such as RDF/RDFS and the SPARQL query language to integrate and use data from disparate sources has some advantages over using a standard relational database. The Resource Description Framework (RDF) uses predicates to define relationships between data objects, and RDF Schema (RDFS), which is written in RDF, offers a modeling language for knowledge representation and ontology development. (See Sidebar 1. Why RDF/RDFS for the Semantic Web?) Used together, these technologies enable you to use information from disparate sources in different formats/schemas without having to convert the data to a “standard format”?as you would with a relational database.

This article introduces Java developers to semantic web application development using Java and JRuby. It demonstrates how to employ the semantic web’s functionality through an application example that processes news articles to identify and store (in an RDF repository) industry terms and the names of people and places. The example uses the Sesame libraries for RDF storage, RDFS inferencing, and running SPARQL queries, and the downloadable source code provides a simple wrapper API for Sesame and some examples of queries against sample RDF data.

Getting Started
You can find many libraries and frameworks in several programming languages for using semantic web technologies. For this short article, I bypassed many good alternatives and chose some favorite tools called the Sesame libraries. At some point, take the time to study the complete Sesame APIs, system configuration, and complete documentation. However, for the purposes of this article, all you need is the downloadable source code, which is a wrapper API for Sesame that includes Sesame and all the libraries that you will need to work through the examples. Specifically, the source code contains:

  • One large JAR file with everything you need for both the Java and JRuby program examples
  • Raw text files from a few Reuters news articles
  • The RDF data files generated by the utility ruby_utilities/raw_data_to_rdf.rb (I wrote the utility raw_data_to_rdf.rb to extract semantic information from the sample news articles and write RDF triples to a data file used in the example programs.)
  • An example of RDF in the more readable N3 format
  • Some JRuby example programs
  • Some RDF data for experimentation
Author’s Note: The program examples are dual licensed. You can use the downloadable source code under either the LGPL or Apache 2 licenses. Sesame itself and the libraries that it requires are licensed under BSD-style and LGPL licenses.

The example uses two data types for object values: URIs and string literals. RDF originally was expressed as XML data files and while the XML format is still widely used for automated processing, the example uses two alternative formats, N-Triples and Notation3 (N3), because they are much easier to read and understand. Sesame can be used to convert between all RDF formats, so why not use more readable and understandable formats?

RDF data consists of a set of triple values:

  • subject
  • predicate
  • object

In the context of this article, a triple might look like this:

  • subject: A URL (or URI) of a news article
  • predicate: A relation like “containsCity”
  • object: A value like “Burlington”
Figure 1. Conceptual Overview of the News-Processing System: When the Ruby script ruby_utilities/raw_data_to_rdf.rb and the file rdf_files/news.nt are created, you can then use only Sesame with the wrapper API for Java and JRuby.

Figure 1 shows a conceptual overview of the application example. It is conceptual because it does not include code for web scraping. Instead, it uses manually copied text from a few news articles (human names, place names, and key terms) for input to the entity extraction utility raw_data_to_rdf.rb. As Figure 1 shows, when the Ruby script ruby_utilities/raw_data_to_rdf.rb and the file rdf_files/news.nt are created, you can then use only Sesame with the wrapper API for Java and JRuby. This wrapper library can write N-Triple data to the more convenient N3 format. (Later, you also will see several N3 examples.)

To build a full production system based on the examples in this article, you will need to write Ruby scripts that web scrape a few select news web sites. These scripts are not difficult to write, but a general web scraper that ignores things like advertisements and navigation HTML is very difficult to write?and beyond the scope of this article.

In order to simplify this system and concentrate only on using RDF/RDFS, the assumption is that news articles exist in the directory raw_data in the Rails application directory and do not provide any web site-specific web scraping code. This directory contains the text of four Reuters news articles for testing. You can replace these files with data from other information sources (e.g., word-processing documents, PDF files, databases, etc.) The utility ruby_utilities/raw_data_to_rdf.rb reads the data in the directory raw_data, uses the Reuters OpenCalais web service to find entities in each article, and then writes RDF triple data to the file rdf_files/news.nt. The OpenCalais web services can be freely used (up to 20K web service calls a day); for my work I use both OpenCalais and my own system to extract information from text.

Modeling with RDF, RDFS
RDFS supports the definition of classes and properties based on set inclusion. (Be aware that classes and properties in RDFS are orthogonal.) The example in this article does not simply use properties to define data attributes for classes, which is different from object modeling and the procedure used by object-oriented programming languages such as Java, Ruby, and Smalltalk. In addition to facilitating the combination of different data sources, you can use RDFS inferencing to effectively generate new RDF?that is, inferencing asserts new RDF triples.

Let’s get started with example RDF data from news articles and then look at example programs that show some basic techniques for building semantic web applications.

Person, Place, and Industry Terms Stored in RDF
Using the Reuters OpenCalais system, I wrote a simple Ruby script ruby_utilities/raw_data_to_rdf.rb that reads text files containing news stories and generates RDF data in N-Triples. This format is composed of a subject, a predicate, an object, and a period.

The downloadable code for this article contains a sample output RDF N-Triples file called rdf_files/news.nt. Listing 1 contains a few lines from that file. You can see triple elements being defined either in specific name spaces or as string literals. The predicate containsIndustryTerm is defined in the namespace of the knowledgebooks.com domain: . Namespaces can be abbreviated using a prefix notation that you will use later when you switch to the N3 RDF format.

So where is RDF data actually stored? Sesame supports an in-memory RDF store, which the Sesame wrapper in the downloadable code uses, as well as several different back-end data store mechanisms. Although these alternative storage back ends can be selected with just a few lines of code (see the Sesame web site for documentation), configuring them is time consuming. I suggest learning the basics of RDF/RDFS modeling and effective SPARQL use and not worrying too much about deployment until you have an interesting application to deploy.

Querying N-Triple RDF Data Using SPARQL
This section shows complete JRuby and Java examples that query N-Triple RDF data using SPARQL. The sections to follow will use just code snippets. Java and/or Ruby programmers should easily make sense of the code in the examples and have few problems using derivative code in their own programs. All the examples use my Sesame wrapper library, which is much simpler than calling the Sesame APIs directly. You eventually may want to use the full Sesame APIs.

Listing 2 is a complete listing of the JRuby example file jruby_sesame_example2.rb. The class TripleStoreSesameManager in Listing 2 is defined in the wrapper library. The method doSparqlQuery requires two arguments: a string containing a valid SPARQL query and an instance of any Ruby class that defines the method processResult. If you have a syntax error in your SPARQL query, the Sesame library will print useful error messages.

You will find some similarity between SPARQL and SQL. The SELECT statement specifies one, two, or three of the triple terms that should be returned with each query result. Here, I wanted to see only the subject and object because the predicate triple term is defined in the WHERE clause to match exactly.

JRuby is a good language for working with Sesame because it is dynamically typed and is very terse. Also, the ability to work interactively in an irb console is a big win. Overall, coding experiments are simpler with JRuby than with Java. That said, once I use a dynamic language like JRuby for code experiments, I usually use Java for production work. Listing 3 shows a similar example to Listing 2 using Java and my Sesame wrapper library.

Because Java is strongly typed, the second argument to the method doSparqlQuery in Listing 2 is defined using the interface ISparqlResultHandler, which defines the method signature for processResult.

The remainder of this article concentrates on N3, a better RDF format, for using data from different sources that use different schemas. It also offers more advanced SPARQL examples.

A Better RDF Format: N3 Syntax
Frameworks like Sesame that provide RDF storage and querying functionality do not care which RDF format you use for input. However, it is wise to use the easiest to read and understand format. For that reason, I use N3 whenever I can. Many years ago, I started working with RDF using its XML serialization format, which I found confusing and generally counterintuitive. N-Triple, Turtle (a simpler form of N3, which this article doesn’t discuss), and N3 are all vastly superior to the RDF XML format.

My wrapper library has an API for calling the Sesame utility code for converting whatever RDF data is in its RDF store to N3. Here is a Java code snippet that shows you how to do this:

public class ConvertTriplesToN3 {    public static void main(String [] args) throws RepositoryException, IOException, RDFParseException, 
RDFHandlerException { TripleStoreSesameManager ts = new TripleStoreSesameManager(); ts.loadRDF("rdf_files/rdfs.nt"); ts.loadRDF("rdf_files/news.nt"); ts.saveRepositoryAsN3("sample_N3.n3"); }}

Unfortunately, Sesame writes out N3 data without using namespace abbreviations. Listing 4 shows a few lines produced by the above code snippet. You also can use utility programs such as CWM to convert different RDF formats.

N3 allows you to collapse many N-Triple RDF statements (again, subject, predicate, object, “.”) into a single N3 statement for N-Triple statements with the same subject. Let’s look at an N3 fragment in more detail (assume that the kb: and rdfs: namespace prefixes are defined):

 kb:containsCity "Burlington" , "Denver" ;	kb:containsRegion "U.S. Midwest" , "Midwest" ;	kb:containsCountry "United States" , "Japan".

Here, the subject is the complete URL for the news article on the web. This article has two objects “Burlington” and “Denver” for the predicate kb:containsCity, multiple objects separated by commas, and the last object is followed by a semicolon, which indicates that the next term will start a new predicate that is followed by one or more objects. Notice that the last line is terminated with a period; that also terminates this N3 statement.

I hand-edited the file rdf_files/news.n3 (very easily with regular expression search and replace) to add namespace abbreviations to the automatically converted N3 file. Listing 5 shows the first few lines of the file news.n3. The first two lines define namespace abbreviations (or “prefixes”). As an example, the abbreviation “rdfs” for RDF Schema and “kb” for my own knowledgebooks.com namespace are used to define a new RDFS property kb:containsPlace, which is a super property of kb:containsCity, kb:containsCountry, and kb:containsState. Note that in this example I did not make kb:containsCity a sub-property of kb:containsPlace. Using namespace abbreviations makes it a lot easier to read RDF.

So how can you use the new super property kb:containsPlace? No triples in the original triple store had a predicate equal to kb:containsPlace; this property is used to assert new triples using RDFS inferencing. Some RDF triple stores pre-calculate asserted triples, while others calculate them as needed during SPARQL query processing. As a semantic web developer, it makes no conceptual difference how the triple store works internally, but you likely will face memory-use versus querying-performance tradeoffs.

As an example of RDFS inferencing, suppose that you have one application that runs fine-grained queries for a news article containing a specific state and another application that searches for all news stories that contain any references to physical locations. The first application could query matching kb:containsState and triple objects against a string literal for the state name (or you might use 50 URIs to represent states). The second application can use the super property in a SPARQL query like this:

sparql_query = "PREFIX kb:   
SELECT ?subject ?object WHERE { ?subject kb:containsPlace ?object . }";

This query matches all articles with a predicate equal to containsRegion, containsCountry, or containsState. Notice the SPARQL syntax for using namespace abbreviations (or prefixes) using the PREFIX keyword.

Author’s Note: There is no difference performing SPARQL queries against different RDF formats. An RDF storage repository like Sesame stores RDF in an efficient internal format. Developers may have a tendency to think of formats as XML RDF or N3 RDF, but once data has been read into a repository, it does not matter which original RDF format was used. It is also important to remember that a single N3 statement generally will define many RDF triples (all with the same subject).

By using RDFS (in this case, defining the super property containsPlace), you can change the way you access RDF data without converting it. In a relational database application, you would need to use either special queries (that would have to change if you wanted to add a new sub property to containsPlace) or new tables or database views. Yes, a relational database solves this reuse problem also, but with much less flexibility than RDFS.

When you have multiple data sources using different schemas/formats, then RDF with RDFS provides even more flexibility, as you will soon see.

Matching Data Using Regular Expressions
With some loss of efficiency, you can use regular expression matching in SPARQL queries. As an example of this technique, Listing 6 uses the RDF file rdf_files/oil_example.n3. The Listing 6 SPARQL query seeks to find all RDF triples that contain the word “oil” in the object field, where the predicate field of the triple is equal to kb:containsIndustryTerm. To apply regular expression matching, use one of the previous JRuby example programs (see Listing 2) and change the name of the N3 file loaded and the SPARQL query string to this:

tsm.loadRDF("rdf_files/oil_example.n3")sparql_query =  "PREFIX kb:     SELECT ?subject ?object   WHERE { ?subject kb:containsIndustryTerm ?object FILTER regex(?object, "oil") . }"

Here, I added a filter term after ?object that restricts ?object values to strings containing “oil.” Two RDF triples match, so two lines (each with the article URL and the object value) get printed out:

  [http://news.yahoo.com/s/nm/20080616/ts_nm/usa_flooding_dc_16/, oil]  [http://news.yahoo.com/s/nm/20080616/ts_nm/usa_politics_dc_2/, oil prices]

Now, consider a similar but more interesting example: augmenting the regular expression example to find all triples for matched articles. Given the article URLs that were found in the previous example, you can collect a set of all RDF triples with subjects equal to any of the matched article URLs by changing the SPARQL query string to this:

sparql_query ="PREFIX kb:  SELECT  ?subject ?predicate ?object2WHERE {    ?subject kb:containsIndustryTerm ?object FILTER regex(?object, "oil") .    ?subject ?predicate ?object2 .}"

This query has two WHERE clauses: The first matches all triples with a predicate term equal to kb:containsIndustryTerm, and the second matches all triples where the subject matches the first WHERE clause. Results will each contain three subject/predicate/object values.

Merging Data from Different Sources That Use Different RDF Schemas
Your semantic web application will need to use data from different sources, and the following example shows you how to implement that functionality. In addition to using the rdf_files/news.n3 file from the previous examples, this example will also use rdf_files/news_2.n3, which uses a very different schema:

@prefix ex:   .@prefix rdfs:   . 
kb:about "Academy Award red carpet got wet in the rain" ; ex:author "Joy Smith" ; ex:location "United States" , "Los Angeles" ; kb:keyword "entertainment" , "movies" .
kb:about "Oil prices rise" ; ex:author "Sam Suvy" ; ex:location "United States" , "Chicago" ; kb:keyword "cars" , "fuel", "oil" .

Looking at this new RDF file, you will see some similarities with the previous example’s RDF file news.n3:

  • The news_2.n3 file uses a property location that is similar to the properties in news.n3: containsCity, containsCountry, and containsState. These properties are defined in different namespaces, but that is not a problem (more on this shortly).
  • The news_2.n3 file uses a property keyword that is similar to the property containsIndustryTerm in news.n3. It might make sense to perform fuzzy matches between keyword object values and containsIndustryTerm object values.

The issue of handling locations can be solved by simply adding another property statement:

ex:location rdfs:subPropertyOf kb:containsPlace .

Now any SPARQL queries run against kb:containsPlace without your having to modify any data. For the second similarity in both information sources having lists of keywords or industry standard terms, you can add another statement:

ex:keyword rdfs:subPropertyOf kb:containsIndustryTerm .

I prefer using my own knowledgebooks.com namespace in SPARQL queries, but if I wanted to use the ex:keyword property, I could have just reversed the subject and object in this RDF statement.

Using Classes in RDFS Modeling
You may be surprised that all of the examples so far have dealt with RDFS properties and not RDFS classes. As previously mentioned, RDFS properties and RDFS classes are orthogonal in the sense that properties are not used to define attributes (or class variables) for RDFS classes. You can add and use properties with classes in an ad hoc way, extending classes and the use of properties at any time. The following example for using classes in RDFS modeling uses the N3 file rdf_files/class_example.n3:

@prefix kb:   .@prefix rdf:  .@prefix rdfs:   .@prefix foaf:  .foaf:Person rdfs:subClassOf foaf:Agent .kb:KnowledgeEngineer rdfs:subClassOf foaf:Person . a kb:KnowledgeEngineer .

Notice that the predicate uses the abbreviation a, which means that the subject URI is a member of the class kb:KnowledgeEngineer. The following SPARQL query will print out all the subjects and predicates for triples whose object is equal to foaf:Agent:

require "java"require "sesame_wrapper.jar"require 'pp'include_class "TripleStoreSesameManager"include_class "DefaultSparqlResultHandler"tsm = TripleStoreSesameManager.newtsm.loadRDF("rdf_files/class_example.n3")sparql_query =   PREFIX foaf:     SELECT ?subject ?predicate WHERE { ?subject ?predicate foaf:Agent . }";tsm.doSparqlQuery(sparql_query, DefaultSparqlResultHandler.new)  

Listing 7 shows the output from running this example. Notice a couple of interesting things:

  • The subject URI is of type foaf:Agent. By logical inference, my URI is of type kb:KnowledgeEngineer, which is of type foaf:Person, which is of type foaf:Agent.
  • Both foaf:Person and are of type foaf:Agent.

It is often interesting and useful to make “broad” SPARQL queries like this example to see the triples that Sesame (or any other RDF triple store) asserts through inference.

Where to Go from Here
Now that you have seen how to employ the semantic web’s functionality using Java and JRuby, you can write derivative code from the examples in this article to build your own semantic web programs.

I suggest that you look in two directions for starting your own semantic web projects:

  • Publish your own data sources as RDF, and then provide consumers of your data with RDFS and example SPARQL queries to help them get started.
  • Identify sources of RDF data than can enhance your own web applications. Use SPARQL queries to collect data for your own use.
devx-admin

devx-admin

Share the Post:
Bold Evolution

Intel’s Bold Comeback

Intel, a leading figure in the semiconductor industry, has underperformed in the stock market over the past five years, with shares dropping by 4% as

Semiconductor market

Semiconductor Slump: Rebound on the Horizon

In recent years, the semiconductor sector has faced a slump due to decreasing PC and smartphone sales, especially in 2022 and 2023. Nonetheless, as 2024

Learn Web Security

An Easy Way to Learn Web Security

The Web Security Academy has recently introduced new educational courses designed to offer a comprehensible and straightforward journey through the intricate realm of web security.

Military Drones Revolution

Military Drones: New Mobile Command Centers

The Air Force Special Operations Command (AFSOC) is currently working on a pioneering project that aims to transform MQ-9 Reaper drones into mobile command centers

Geoengineering Methods

Scientists Dimming the Sun: It’s a Good Thing

Scientists at the University of Bern have been exploring geoengineering methods that could potentially slow down the melting of the West Antarctic ice sheet by reducing sunlight exposure. Among these

Bold Evolution

Intel’s Bold Comeback

Intel, a leading figure in the semiconductor industry, has underperformed in the stock market over the past five years, with shares dropping by 4% as opposed to the 176% return

Semiconductor market

Semiconductor Slump: Rebound on the Horizon

In recent years, the semiconductor sector has faced a slump due to decreasing PC and smartphone sales, especially in 2022 and 2023. Nonetheless, as 2024 approaches, the industry seems to

Elevated Content Deals

Elevate Your Content Creation with Amazing Deals

The latest Tech Deals cater to creators of different levels and budgets, featuring a variety of computer accessories and tools designed specifically for content creation. Enhance your technological setup with

Learn Web Security

An Easy Way to Learn Web Security

The Web Security Academy has recently introduced new educational courses designed to offer a comprehensible and straightforward journey through the intricate realm of web security. These carefully designed learning courses

Military Drones Revolution

Military Drones: New Mobile Command Centers

The Air Force Special Operations Command (AFSOC) is currently working on a pioneering project that aims to transform MQ-9 Reaper drones into mobile command centers to better manage smaller unmanned

Tech Partnership

US and Vietnam: The Next Tech Leaders?

The US and Vietnam have entered into a series of multi-billion-dollar business deals, marking a significant leap forward in their cooperation in vital sectors like artificial intelligence (AI), semiconductors, and

Huge Savings

Score Massive Savings on Portable Gaming

This week in tech bargains, a well-known firm has considerably reduced the price of its portable gaming device, cutting costs by as much as 20 percent, which matches the lowest

Cloudfare Protection

Unbreakable: Cloudflare One Data Protection Suite

Recently, Cloudflare introduced its One Data Protection Suite, an extensive collection of sophisticated security tools designed to protect data in various environments, including web, private, and SaaS applications. The suite

Drone Revolution

Cool Drone Tech Unveiled at London Event

At the DSEI defense event in London, Israeli defense firms exhibited cutting-edge drone technology featuring vertical-takeoff-and-landing (VTOL) abilities while launching two innovative systems that have already been acquired by clients.

2D Semiconductor Revolution

Disrupting Electronics with 2D Semiconductors

The rapid development in electronic devices has created an increasing demand for advanced semiconductors. While silicon has traditionally been the go-to material for such applications, it suffers from certain limitations.

Cisco Growth

Cisco Cuts Jobs To Optimize Growth

Tech giant Cisco Systems Inc. recently unveiled plans to reduce its workforce in two Californian cities, with the goal of optimizing the company’s cost structure. The company has decided to

FAA Authorization

FAA Approves Drone Deliveries

In a significant development for the US drone industry, drone delivery company Zipline has gained Federal Aviation Administration (FAA) authorization, permitting them to operate drones beyond the visual line of

Mortgage Rate Challenges

Prop-Tech Firms Face Mortgage Rate Challenges

The surge in mortgage rates and a subsequent decrease in home buying have presented challenges for prop-tech firms like Divvy Homes, a rent-to-own start-up company. With a previous valuation of

Lighthouse Updates

Microsoft 365 Lighthouse: Powerful Updates

Microsoft has introduced a new update to Microsoft 365 Lighthouse, which includes support for alerts and notifications. This update is designed to give Managed Service Providers (MSPs) increased control and

Website Lock

Mysterious Website Blockage Sparks Concern

Recently, visitors of a well-known resource website encountered a message blocking their access, resulting in disappointment and frustration among its users. While the reason for this limitation remains uncertain, specialists

AI Tool

Unleashing AI Power with Microsoft 365 Copilot

Microsoft has recently unveiled the initial list of Australian clients who will benefit from Microsoft 365 (M365) Copilot through the exclusive invitation-only global Early Access Program. Prominent organizations participating in

Microsoft Egnyte Collaboration

Microsoft and Egnyte Collaboration

Microsoft has revealed a collaboration with Egnyte, a prominent platform for content cooperation and governance, with the goal of improving real-time collaboration features within Microsoft 365 and Microsoft Teams. This

Best Laptops

Top Programming Laptops of 2023

In 2023, many developers prioritize finding the best laptop for programming, whether at home, in the workplace, or on the go. A high-performing, portable, and user-friendly laptop could significantly influence

Renaissance Gaming Magic

AI Unleashes A Gaming Renaissance

In recent times, artificial intelligence has achieved remarkable progress, with resources like ChatGPT becoming more sophisticated and readily available. Pietro Schirano, the design lead at Brex, has explored the capabilities

New Apple Watch

The New Apple Watch Ultra 2 is Awesome

Apple is making waves in the smartwatch market with the introduction of the highly anticipated Apple Watch Ultra 2. This revolutionary device promises exceptional performance, robust design, and a myriad

Truth Unveiling

Unveiling Truths in Bowen’s SMR Controversy

Tony Wood from the Grattan Institute has voiced his concerns over Climate and Energy Minister Chris Bowen’s critique of the Coalition’s support for small modular nuclear reactors (SMRs). Wood points

Avoiding Crisis

Racing to Defy Looming Financial Crisis

Chinese property developer Country Garden is facing a liquidity challenge as it approaches a deadline to pay $15 million in interest associated with an offshore bond. With a 30-day grace

Open-Source Development

Open-Source Software Development is King

The increasingly digital world has led to the emergence of open-source software as a critical factor in modern software development, with more than 70% of the infrastructure, products, and services

Home Savings

Sensational Savings on Smart Home Security

For a limited time only, Amazon is offering massive discounts on a variety of intelligent home devices, including products from its Ring security range. Running until October 2 or while