Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Relational Database Integration with RDF/OWL : Page 3

Using the W3C OWL ontology standard lets you get more out of all kinds of data. Find out how this standard and some free software lets you query two databases as if they were one.


advertisement
Creating an Ontology From the Data
If you look at datadump.rdf with a text editor, you'll see namespace declarations like this:

xmlns:j.0="file:outlook/" xmlns:j.1="file:eudora/"

You'll also see rdf:type elements with a value of file:outlook/entries or file:eudora/entries as their rdf:resource attribute value. A URL beginning with file: needs at least two slashes after it, so use your text editor to do a global replacement that adds two slashes after the string file: throughout datadump.rdf.

Now, SWOOP can create a simple ontology from your data dump. Start it, and pick Load/Ontology from the File menu. You're not really loading an ontology, but instead the RDF file that you pulled from D2RQ: datadump.rdf. It might take a few minutes, so to speed this up you could edit down the datadump.rdf file to only include the data from one record of each database. SWOOP builds an ontology out of the properties that it sees, so it only needs to see one example of each property.



After you've loaded the data file, immediately click Save As from the File menu and save the file as postSwoop.rdf.

Look at both datadump.rdf and postSwoop.rdf with a text editor. Although you didn't do anything to the datadump.rdf "ontology" that you loaded, SWOOP added OWL declarations to postSwoop.rdf for all the properties it found, like the following:

<owl:DatatypeProperty rdf:about="&eudora;entries_workZip"/> <owl:DatatypeProperty rdf:about="&eudora;entries_zip"/> <owl:DatatypeProperty rdf:about="&outlook;entries_ISDN"/> <owl:DatatypeProperty rdf:about="&outlook;entries_TTYTDDPhone"/>

The Entity references &eudora; and &outlook; are standing in for the strings file://eudora/ and file://outlook/ to make the rdf:about values proper URLs, which RDF requires for identifiers. Now you can use SWOOP to enhance the ontology so that SPARQL queries against the combined databases can do things that SQL couldn't do to the MySQL versions of the databases. We'll start by adding something to the ontology that says that the workState column of the eudora database's entries table is equivalent to the businessState column of the outlook database's entries table.

Select the Property Tree tab in the lower left of SWOOP and click on “entries_workState.” If you click the RDF/XML tab near the top, you'll see an RDF/XML representation of this property's declaration, which looks like the DatatypeProperty declarations above. Among other things, this shows that this property is part of the file://eudora namespace.

By using SWOOP, you don't have to deal with RDF/XML, so click the Concise Format tab to return to the display that is easier to read and interact with. Click on Add next to "Equivalent to:," select "entries_businessState" on the Specify Property dialog box that appears, and click the Add button, and then the Cancel button to show that you're done with the Specify Property dialog box (see Figure 1). You'll see "entries_businessState" appear under "Equivalent to:" on the main pane, but it's not official until you click the Apply Changes button at the bottom. Save the edited ontology by picking Save from the File menu or by pressing Ctrl+S.

 
Figure 1. Using Swoop: After you indicate that the entries_businessState property is equivalent to entries_workState, the EquivalentProperty section of the information for entries_workState reflects your change.
If you search for "equivalent" in postSwoop.rdf, you'll see how SWOOP saved this; it changed the following:

<owl:DatatypeProperty rdf:about="&eudora;entries_workState"/>

into this:

<owl:DatatypeProperty rdf:about="&eudora;entries_workState"> <owl:equivalentProperty rdf:resource="&outlook;entries_businessState"/> </owl:DatatypeProperty>

RDF/XML representations of OWL statements are not difficult to read, but they are verbose, which is why a tool like SWOOP makes editing them much easier.

For a more thorough job of integrating the two databases, you also define eudora:entries_firstName as equal to outlook:entries_firstName, eudora:entries_lastName as equal to outlook:entries_lastName, and many other equivalencies. The more relationships you can identify between the two databases, the more tightly they'll be integrated. The Bobby Fisher use case works better if you define at least these two.

To make it possible to find all of Alfred Adams' phone numbers, regardless of which ones we have stored for him, we want to indicate that all the phone properties have some semantics in common. To do this, create a new phone property and then make all other phone properties subproperties of that. Click the Add P button with a yellow "P" at the left of the SWOOP workspace to display the New Entity dialog box. Set the Property Type to OWL Datatype Property, because phone numbers are simple strings. Set subProperty-of to None, which is the first choice, because our new property isn't a subproperty of any other. Based on the URIs that D2RQ generated from my address book, I replaced the default Logical URI on the dialog box with http://localhost/entries/phone. I left the other fields blank. Click the dialog box's Add & Close button, and "phone" appears at the bottom of the Property Tree. If it doesn't appear, you can click Remove and start again.

After you add this new property, make sure it's selected, and then click Add next to "Superproperty of." OWL has no "superproperty" property—when you tell SWOOP that entries:phone is a superproperty of outlook:entries_homePhone, it stores outlook:entries_homePhone as a subproperty of entries:phone. It's faster in the SWOOP interface to indicate that entries:phone is a superproperty of 15 other properties than it is to go to each of those 15 and mark it as a subproperty of entries:phone. In the Specify Property dialog box, Ctrl-click lets you select multiple properties as subproperties of the new phone property. Select the following, and then click Add to include them in the main window and then click Cancel when you're finished with the Specify Property dialog box. You'll see "entries_otherPhone" twice on the list, once from each of the Eudora and Outlook namespaces. Select both.

entries_businessFax entries_businessPhone entries_businessPhone2 entries_carPhone entries_homeFax entries_homePhone entries_homePhone2 entries_mobile entries_mobilePhone entries_otherPhone entries_phone entries_primaryPhone entries_workMobile entries_workPhone

Click Apply Changes and save your work.

To let Pellet discover that Bobby Fischer and Robert L. Fischer are the same person, you've already done half the work by adding new information about the emap:entries_email1 and omap:entries_email2Address property bridges in the mapping file. We need to have the ontology specify that the properties used for personal email addresses are inverse functional properties, which means that only one instance of a class can have a particular email1 value. This way, an email1 value of "bobby416@gmail.com" for both Bobby and Robert means that they're the same person.

When using OWL DL, a property that is an inverse functional property must be an object property, not a datatype property (that is, a URL and not a string, which is why we tweaked the mapping file for this entry the way we did). Select "entries_email1" in the Property Tree tab and then click Add next to Attributes at the bottom of the main pane. Select Inverse Functional and the Yes button on the Specify Property Attribute dialog box, and then click Apply Changes at the bottom of the main pane and save your work. Follow these same steps to designate "entries_email2Address" as an inverse functional property, and then define "entries_email2Address" as an entries_email1 to be equivalent following the same steps that you used to define the equivalent property pairs earlier.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap