Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Relational Database Integration with RDF/OWL

Using the W3C OWL ontology standard lets you get more out of all kinds of data. Find out how this standard and some free software lets you query two databases as if they were one.

WL ontologies allow you to describe data and relationships between data items. Common examples of this include complex knowledge domains such as pharmacology, but you can use OWL ontologies with simple, straightforward data that most companies already have stored in a relational database package. When you add metadata to existing data, and then use that metadata to query the data collection, you get more value out of that data. Data is the most important asset of many organizations, so the use of standards-based technology to query data collections is becoming more and more attractive.

The primary goal of this article is to put together a demonstration of how you can use OWL to integrate two relational databases, and then perform queries against the aggregate collection to answer realistic questions that you could not answer without the addition of an OWL ontology.

Secondary goals include:

  • Using free, portable, open-source software
  • Making all the relevant files available so that others could reproduce the results (contained in the owlrdbms.zip file—see its readme.txt file for instructions on how to use its files to create and load the sample data)
  • Using address book data, which has relevance to everyone
  • Doing it all with no specialized programming (I did write a short XSLT 1.0 stylesheet utility, which is included in the zip file, but managed to avoid any coding that required compilation and deployment)
Use Cases
For sample data to load into the MySQL relational database package, I created two collections of address book data, one based on the fields used by the Eudora email program and another based on the Microsoft Outlook 2003 address book structure.

Despite the different field names used by the two packages (for example, businessState for the work address part of an Outlook address book entry and workState for the corresponding field in Eudora), I wanted to issue a single query against the aggregate collection of data to answer each of the following questions:

  • Who works in New York state?
  • List any phone numbers (home, mobile, business, and so forth) for Alfred Adams.
  • Find all information about Bobby Fischer at 2304 Eighth Lane, even if the other database lists him as Robert L. Fischer of 2304 8th Ln.
After the data was loaded into MySQL, the following steps made these queries possible:
  1. I used the D2RQ interface (release 0.4) to extract an RDF representation of the relational data.
  2. Using the open-source ontology editor SWOOP (release 2.3 beta 4), I automatically generated the mundane, repetitive parts of the ontology.
  3. Continuing with SWOOP, I did some pointing and clicking to identify relationships between fields in the two databases that enable the execution of the use case queries.
  4. I pulled the RDF/OWL syntax generated by SWOOP into its own file separate from the instance data to let me use it with future versions of the instance data as the address book databases get updated.
  5. Using the OWL reasoner Pellet (the most recent release is release 1.5.2, but that has problems with the third use case above, so I recommend 1.5.0 if you want to reproduce what is shown here), I issued queries in the W3C standard SPARQL query language against the data/metadata combination, getting more complete answers than would have been possible just querying against one of the databases without the metadata that identifies the connections between the two.
I did all this under Windows. With the exception of Eudora and Outlook, which I used to create the templates for the sample data, all the software described here is available for Linux as well.

Sample Data
I didn't want to use real address book data, so with a little scripting I generated a good amount of some fake, yet realistic-looking address book data:

  1. In each of the two email programs, I created a new address book entry and filled out every field.
  2. I saved the address book as a comma separated value (CSV) file and looked over the results.
  3. Using a Python script that included a list of the most popular given names, surnames, city names, and so forth in the U.S. (the zip file's readme.txt file lists the source of each list), I generated sample data with a structure that corresponded to the exported CSV file.
  4. I imported my outlook.csv file into Outlook to make sure everything ended up in the right place, and it did (Eudora offered no "import CSV" option).
  5. Because the SPARQL queries for the second and third use cases look for specific data among the randomly generated data, I added that data to the CSV files (the readme.txt included with the zip file has these lines and describes where to add them).
  6. I created an SQL script included with owlrdbms.zip to create the two MySQL databases (MySQL 5.0, under Windows) and loaded each CSV file into one of the databases.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date