Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Relational Database Integration with RDF/OWL : Page 4

Using the W3C OWL ontology standard lets you get more out of all kinds of data. Find out how this standard and some free software lets you query two databases as if they were one.

Querying the Rules and Data
Now that your postSwoop.pdf file has a combination of data and OWL rules, you could issue Pellet SPARQL queries against it, but we want to use this ontology with future versions of the database as the content evolves, so use a text editor to delete all the parts of the postSwoop.pdf that SWOOP didn't generate. This includes the parts with address book data about specific people; in the dump from D2RQ, there should be an XML comment that says "Instances" to make this easier. After making this deletion, save the file as properties.owl.

Let's imagine that the data in MySQL was updated, you created a more up-to-date version of datadump.rdf, and you want to run some queries against the combination of the data in datadump.rdf data and the metadata in properties.owl. For the purposes of the demo, you can use the datadump.rdf file left over from before.

To make it easy to combine the two files, I created a short XSLT 1.0 stylesheet (included in the zip file) named rdfcat.xsl. When run against a file like the following, this stylesheet combines the output into a single RDF file:

<rdfcat xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="myfile1.rdf"/> <xi:include href="myfile2.rdf"/> <xi:include href="myfile3.rdf"/> </rdfcat>

The stylesheet assumes that you're using an XSLT processor such as libxslt or Saxon that also implements the W3C XInclude specification. The stylesheet also adds an OWL declaration for each resource it finds, calling it a member of the owl:Thing class so that the OWL reasoner doesn't complain about resources that haven't been declared as a member of a class. To pull RDF from D2RQ and save it in a file, I created the following rdfcat.xsl file to show the files I wanted concatenated together: The following command line uses libxslt's xsltproc program and rdfcat.xml to combine the latest version of datadump.rdf with the properties.owl file into a file called combo.rdf:

xsltproc --xinclude rdfcat.xsl rdfcat.xml > combo.rdf

The combo.rdf file is the combination of ontology metadata and formerly relational data that we've been working toward, and we can now run SPARQL queries that implement our use cases on it. The first query asks for all data for all subjects that have a file://eudora/entries_workState value of "NY."

PREFIX e: <file://eudora/> SELECT * WHERE { ?s e:entries_workState "NY" }

There are many SPARQL engines to choose from out there, but not all implement OWL. Pellet is a free one, so I used that:

pellet -if combo.rdf -ifmt RDF/XML -qf nyworkers.spq > nyworkers.out

Because of the equivalence relationship that you defined, Pellet should list the subjects for both the e:entries_workState triples and the out:entries_businessState triples that have a value of "NY." You won't know if it's correct unless you check how many of those are in each of your two databases, but I found that the random data generation script put four or five in each database, so if this query retrieves more than six or seven, it's good news.

Pellet also outputs a few suggestions for OWL statements to make newdatadump.rdf a little more OWL DL compliant. The rdfcat.xsl stylesheet adds rules to account for a few of these, and it could use a few more.

The following shows AAphone.spq, which asks for all phone numbers for Alfred Adams, whether the database has his home phone, work phone, mobile phone, or any other phone numbers. The URL assigned to the e: prefix is the one we used when defining the new phone property, the superproperty of the various phone number properties.

PREFIX e: <http://localhost/entries/> PREFIX eud: <file://eudora/> SELECT ?phoneType ?phone WHERE { ?s ?phoneType ?phone. ?s e:phone ?phone. ?s eud:entries_lastName "Adams". ?s eud:entries_firstName "Alfred". }

The last use case, stored in FisherData.spq in the zip file, asks for all data about the Bobby Fisher entry, which is from the eudora database. Because we defined eudora:entries_email1 as an inverse functional property, Pellet knows that only one entry can have an email1 value of mailto:bobby416@gmail.com. Because we defined eudora:entries_email1 as equivalent to out:entries_email2Address, Pellet pulls Robert L. Fisher data from the outlook database for any other properties defined as equivalent to eudora properties.

PREFIX e: <file://eudora/> PREFIX o: <file://outlook/> SELECT * WHERE { <http://localhost/addressbook#entries/Bobby/Fisher> ?p ?o }

More Queries to Try
Metadata for metadata's sake does not justify the trouble of adding it. The goal with each bit of metadata was to let a user answer real address book questions more easily. Another classic OWL tweak to the ontology is to have the company home pages and other web addresses listed in the data represented as object properties instead of as datatype properties, but I couldn't think of a query that would then demonstrate how this metadata made the database more useful.

Another nice bit of database integration metadata that OWL can enable is the indication that a field in one database is not equivalent to a certain field in another database, but a subset of it. For example, if an international address book was incorporated into this data, we'd want to show that American zip codes are postal codes, but that all postal codes are not necessarily zip codes. This is done by making the zip code property a subset of the postal code property, much as you did with the phone properties. Then, a query against postal codes would also check zip codes, but not vice versa.

Demonstrating additional property attributes besides the inverse functional property would also be valuable. In a demo similar to this one, I added data indicating which entry represented the spouse of which other entry. With Jane Smith's entry pointing to Joe Smith as her spouse, "spouse" being defined as a symmetric property, and no home phone number or spouse listed for Joe, I could still query for his home phone number and get Jane's because I had defined a rule saying that if someone didn't have a home phone number but their spouse did, then a query for that person's home phone number should return the spouse's number. The definition of such rules are not a standard part of OWL, but Pellet supports them, and work toward a standard definition of such a rule language is underway .

To enhance data by adding metadata to it, a full dump of the data is not very practical. Instead of dumping all of the relational data to an RDF representation each time the database is changed in order to allow Pellet to query an up-to-date data/metadata combination, it would be nice to translate SPARQL queries on the fly to SQL queries, letting us issue SPARQL queries directly against the relational data. This, in fact, is what D2RQ does, but D2RQ currently offers no way to load ontology triples into the knowledge base along with the relational data, unless you want to try storing the OWL statements in a relational table, which could be very interesting. D2RQ isn't the only project making such technology available, but it is free, and as more software supports SPARQL and OWL, the combination will provide us with some great new possibilities in getting more out of our relational databases.

Bob DuCharme, a solutions architect at Innodata Isogen, was an XML "expert" when XML was a four-letter word. He's written four books and nearly 100 on-line and print articles about information technology without using the word "functionality" in any of them. See his blog at snee.com/bobdc.blog for more.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.