Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Relational Database Integration with RDF/OWL : Page 2

Using the W3C OWL ontology standard lets you get more out of all kinds of data. Find out how this standard and some free software lets you query two databases as if they were one.

Using the D2RQ Interface
D2RQ includes several useful utility programs. You send your SPARQL requests to the main one, an HTTP server called d2r-server, which passes along the corresponding SQL statements to MySQL, Oracle, PostgreSQL, or another relational database manager. D2RQ needs a mapping file to convert your SPARQL request into the appropriate SQL query, and another D2RQ utility (generate-mapping) generates this mapping file for you in n3 format. You can use the generated mapping file as-is, but as you will see, various tweaks can improve it.

The following two commands tell generate-mapping to log in to MySQL, look at the data dictionary of the indicated databases, and generate the mapping files that let you issue SPARQL queries against the Eudora and Outlook databases (substitute your own MySQL username and password; also note that a carriage return was added after -p in each command for readability here, but each generate-mapping command is one line):

generate-mapping -o eudoraMapping.ttl -u myID -p myPassword jdbc:mysql://localhost/eudora generate-mapping -o outlookMapping.ttl -u myID -p myPassword jdbc:mysql://localhost/outlook

After doing this, you can start up the d2r-server program with one of these mapping files to experience what D2RQ can do. For example, enter the following in a command line window:

d2r-server eudoraMapping.ttl

I generated the mapping files and started up the server in the directory where the D2RQ programs were stored, so I didn't have to include any path names in that command. At this point, if you installed D2RQ using the default configuration, send a browser to the URL http://localhost:2020/ to start interactively using the D2RQserver. You will see a link for "entries," the single table in the database, and can click on it to start exploring the data. You can also enter your first SPARQL query by going to the SNORQL interface at http://localhost:2020/snorql/ and pasting the following query in, which says "show me all the information about the address book subject that has (328) 618-8442 as a mobile phone number:"

PREFIX eud: SELECT ?p ?o WHERE { ?s eud:entries_mobile "(328) 618-8442" . ?s ?p ?o . }

Combining the Two Databases
The next step is to go back to the command line window where you started up the D2RQ server with the d2r-server command. You'll see status messages that scrolled by as it executed your commands. Press Ctrl+C to abort the server. The next time you start it up, you'll have a new mapping file combined from eudoraMapping.ttl and outlookMapping.ttl that lets you query the two databases at once.

Each of the two mapping files you created assigns the namespace prefix map to a URL for the mapping file. To keep them straight in the combination file, change map: to emap: throughout eudoraMapping.ttl, and to omap: wherever it said map: in outlook.ttl. You don't have to do this to see the demo in action, because the resulting files are all included in the zip file, but you'll want to know about the steps I took to get there in case you do something similar with different databases. Some of the files may need to be rebuilt if one of the relevant software packages gets upgraded, because I've found that the generated URIs may evolve from one software release to another. You'll also need to change the d2rq:username and d2rq:password values in the mapping files to the username and password that you're using to log in to MySQL.

Both ttl files also assigned the prefix vocab to a namespace used for vocabulary terms, so to distinguish them in the combined file, change this prefix to eud throughout eudoraMapping.ttl and to out throughout outlookMapping.ttl. It's also a good idea to change the namespace URL—which wasn't a proper one to begin with—from vocab to eudora in one and from vocab to outlook in the other.

Next, combine the two files into one file named comboMapping.ttl. Move all the namespace declarations to the top and remove the redundant ones.

Below is one of the mapping entries in comboMapping.ttl that you need to edit:

emap:entries_email1 a d2rq:PropertyBridge; d2rq:belongsToClassMap emap:entries; d2rq:property eud:entries_email1; d2rq:column "entries.email1"; .

The mapping entry assigns various predicate/object pairs to the subject emap:entries_email1, which is a special object created for D2RQ: a d2rq:PropertyBridge. As the D2RQ User Manual and Language Specification tells us, property bridges "relate database table columns to RDF properties." As you can see in the entry above, the default generated property bridge is d2rq:column, which is for properties with literal values. A selection of alternatives is available, including d2rq:uriPattern for properties with URI values, which replaces the d2rq.column property in the revised version of the emap:entries_email1 entry below. I made the replacement because the RDF data model lets you represent values as either literal strings or URIs, and representing them as URIs is better because it makes it easier to make connections between different bits of data.

Telling D2RQ to treat an email address such as joe@whatever.com as a URI isn't enough, because software that treats it as a URI won't like its format—it doesn't look like a URI. The value of the new d2rq:uriPattern line that replaces the commented-out d2rq:column line below inserts a mailto: prefix so that the value really is a URI.

The final line of the revised version of the mapping entry tells D2RQ a condition for generating one of these mappings: that the data is not an empty string. If any address book entry is missing an email1 value, we don't want the string mailto: by itself being generated as the address book entry's entries_email1 value.

emap:entries_email1 a d2rq:PropertyBridge; d2rq:belongsToClassMap emap:entries; d2rq:property eud:entries_email1; # d2rq:column "entries.email1"; d2rq:uriPattern "mailto:@@entries.email1@@"; d2rq:condition "entries.email1 <> ''"; .

After making these changes to the emap:entries_email1 property bridge, make similar ones to the omap:email2Address one. For a more thorough database integration, you need to do this to all email addresses throughout combomapping.ttl.

At this point, the combomapping.ttl mapping file is ready, and you could start up the D2RQ server as shown below, and then do queries against either database:

d2r-server comboMapping.ttl

A better alternative is to create an ontology that allows a query tool to see the relationship between the two databases first, so you can do the queries described in the use cases above. SWOOP does the boring part of the ontology generation for you, so you can get to the interesting part. To create a version of this data to load into SWOOP, use another D2RQ utility to dump the data using the new mapping file:

dump-rdf -m comboMapping.rdf -o datadump.rdf

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date