Querying the Rules and Data
Now that your
postSwoop.pdf file has a combination of data and OWL rules, you could issue
Pellet SPARQL queries against it, but we want to use this ontology with future versions of the database as the
content evolves, so use a text editor to delete all the parts of the
postSwoop.pdf that SWOOP
didn't generate. This includes the parts with address book data about specific people; in the dump from D2RQ, there
should be an XML comment that says "Instances" to make this easier. After making this deletion, save the file as
properties.owl.
Let's imagine that the data in MySQL was updated, you created a more up-to-date version of
datadump.rdf, and you want to run some queries against the combination of the data in
datadump.rdf data and the metadata in properties.owl. For the purposes
of the demo, you can use the datadump.rdf file left over from before.
To make it easy to combine the two files, I created a short XSLT 1.0 stylesheet (included in the zip file) named rdfcat.xsl. When run against a file like the following, this stylesheet combines the output into a single RDF file:
<rdfcat xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="myfile1.rdf"/>
<xi:include href="myfile2.rdf"/>
<xi:include href="myfile3.rdf"/>
</rdfcat>
The stylesheet assumes that you're using an XSLT processor such as
libxslt or Saxon that also implements the W3C XInclude specification. The stylesheet also adds an OWL declaration for each resource it finds, calling it a member of the owl:Thing class so that the OWL reasoner doesn't complain about resources that haven't been declared as a member of a class.
To pull RDF from D2RQ and save it in a file, I created the following
rdfcat.xsl file to show the files I wanted concatenated together:
The following command line uses libxslt's xsltproc program and
rdfcat.xml to combine the latest version of
datadump.rdf with the
properties.owl file into a file called
combo.rdf:
xsltproc --xinclude rdfcat.xsl rdfcat.xml > combo.rdf
The
combo.rdf file is the combination of ontology metadata and formerly relational data that
we've been working toward, and we can now run SPARQL queries that implement our use cases on it. The first query asks
for all data for all subjects that have a
file://eudora/entries_workState value of "NY."
PREFIX e: <file://eudora/>
SELECT * WHERE {
?s e:entries_workState "NY"
}
There are many SPARQL engines to choose from out there, but not all implement OWL. Pellet is a free one, so I used that:
pellet -if combo.rdf -ifmt RDF/XML -qf nyworkers.spq > nyworkers.out
Because of the equivalence relationship that you defined, Pellet should list the subjects for both the
e:entries_workState triples and the
out:entries_businessState triples that have a value of "NY." You won't know if it's correct unless you check how many of those are in each of your two databases, but I found that the random data generation script put four or five in each database, so if this query retrieves more than six or seven, it's good news.
Pellet also outputs a few suggestions for OWL statements to make newdatadump.rdf a little more
OWL DL compliant. The rdfcat.xsl stylesheet adds rules to account for a few of these, and it could use a few more.
The following shows AAphone.spq, which asks for all phone numbers for Alfred Adams, whether
the database has his home phone, work phone, mobile phone, or any other phone numbers. The URL assigned to the
e: prefix is the one we used when defining the new phone property, the superproperty of the various phone number properties.
PREFIX e: <http://localhost/entries/>
PREFIX eud: <file://eudora/>
SELECT ?phoneType ?phone WHERE {
?s ?phoneType ?phone.
?s e:phone ?phone.
?s eud:entries_lastName "Adams".
?s eud:entries_firstName "Alfred".
}
The last use case, stored in
FisherData.spq in the zip file, asks for all data about the Bobby Fisher
entry, which is from the eudora database. Because we defined
eudora:entries_email1 as an inverse
functional property, Pellet knows that only one entry can have an
email1 value of
mailto:bobby416@gmail.com. Because we defined
eudora:entries_email1 as equivalent to
out:entries_email2Address, Pellet pulls Robert L. Fisher data from the outlook database for any other
properties defined as equivalent to eudora properties.
PREFIX e: <file://eudora/>
PREFIX o: <file://outlook/>
SELECT * WHERE {
<http://localhost/addressbook#entries/Bobby/Fisher> ?p ?o
}
More Queries to Try
Metadata for metadata's sake does not justify the trouble of adding it. The goal with each bit of metadata was to let a user answer real address book questions more easily. Another classic OWL tweak to the ontology is to have the company home pages and other web addresses listed in the data represented as object properties instead of as datatype properties, but I couldn't think of a query that would then demonstrate how this metadata made the database more useful.
Another nice bit of database integration metadata that OWL can enable is the indication that a field in one database is not equivalent to a certain field in another database, but a subset of it. For example, if an international address book was incorporated into this data, we'd want to show that American zip codes are postal codes, but that all postal codes are not necessarily zip codes. This is done by making the zip code property a subset of the postal code property, much as you did with the phone properties. Then, a query against postal codes would also check zip codes, but not vice versa.
Demonstrating additional property attributes besides the inverse functional property would also be valuable. In a demo similar to this one, I added data indicating which entry represented the spouse of which other entry. With Jane Smith's entry pointing to Joe Smith as her spouse, "spouse" being defined as a symmetric property, and no home phone number or spouse listed for Joe, I could still query for his home phone number and get Jane's because I had defined a rule saying that if someone didn't have a home phone number but their spouse did, then a query for that person's home phone number should return the spouse's number. The definition of such rules are not a standard part of OWL, but Pellet supports them, and work toward a standard definition of such a rule language is underway .
To enhance data by adding metadata to it, a full dump of the data is not very practical. Instead of dumping all of the
relational data to an RDF representation each time the database is changed in order to allow Pellet to query an
up-to-date data/metadata combination, it would be nice to translate SPARQL queries on the fly to SQL queries, letting
us issue SPARQL queries directly against the relational data. This, in fact, is what D2RQ does, but D2RQ currently
offers no way to load ontology triples into the knowledge base along with the relational data, unless you want to try
storing the OWL statements in a relational table, which could be very interesting. D2RQ isn't the only project making
such technology available, but it is free, and as more software supports SPARQL and OWL, the combination will provide
us with some great new possibilities in getting more out of our relational databases.