Creating an Ontology From the Data
If you look at
datadump.rdf with a text editor, you'll see namespace declarations like this:
xmlns:j.0="file:outlook/"
xmlns:j.1="file:eudora/"
You'll also see
rdf:type elements with a value of
file:outlook/entries or
file:eudora/entries as their
rdf:resource attribute value. A URL
beginning with
file: needs at least two slashes after it, so use your text editor to do a
global replacement that adds two slashes after the string
file: throughout
datadump.rdf.
Now, SWOOP can create a simple ontology from your data dump. Start it, and pick Load/Ontology from the File menu.
You're not really loading an ontology, but instead the RDF file that you pulled from
D2RQ: datadump.rdf. It might take a few minutes, so to speed this up you could edit down the
datadump.rdf file to only include the data from one record of each database. SWOOP builds an ontology out of the properties that it sees, so it only needs to see one example of each property.
After you've loaded the data file, immediately click Save As from the File menu and save the file as
postSwoop.rdf.
Look at both datadump.rdf and postSwoop.rdf with a text editor.
Although you didn't do anything to the datadump.rdf "ontology" that you loaded, SWOOP added
OWL declarations to postSwoop.rdf for all the properties it found, like the following:
<owl:DatatypeProperty rdf:about="&eudora;entries_workZip"/>
<owl:DatatypeProperty rdf:about="&eudora;entries_zip"/>
<owl:DatatypeProperty rdf:about="&outlook;entries_ISDN"/>
<owl:DatatypeProperty rdf:about="&outlook;entries_TTYTDDPhone"/>
The Entity references
&eudora; and
&outlook; are standing in for the strings
file://eudora/ and
file://outlook/ to make the
rdf:about values proper URLs, which RDF requires for identifiers. Now you can use SWOOP to
enhance the ontology so that SPARQL queries against the combined databases can do things that SQL couldn't do to the
MySQL versions of the databases. We'll start by adding something to the ontology that says that the
workState column of the
eudora database's entries table is equivalent to the
businessState column of the
outlook database's entries table.
Select the Property Tree tab in the lower left of SWOOP and click on “entries_workState.” If you click the RDF/XML tab
near the top, you'll see an RDF/XML representation of this property's declaration, which looks like the
DatatypeProperty declarations above. Among other things, this shows that this property is part
of the file://eudora namespace.
By using SWOOP, you don't have to deal with RDF/XML, so click the Concise Format tab to return to the display that is
easier to read and interact with. Click on Add next to "Equivalent to:," select "entries_businessState" on the
Specify Property dialog box that appears, and click the Add button, and then the Cancel button to show that you're
done with the Specify Property dialog box (see Figure 1). You'll see "entries_businessState" appear under "Equivalent to:" on the
main pane, but it's not official until you click the Apply Changes button at the bottom. Save the edited ontology by
picking Save from the File menu or by pressing Ctrl+S.
 |
|
|
Figure 1. Using Swoop: After you indicate that the entries_businessState
property is equivalent to entries_workState, the EquivalentProperty section of the information for
entries_workState reflects your change.
|
If you search for "equivalent" in
postSwoop.rdf, you'll see how SWOOP saved this; it changed
the following:
<owl:DatatypeProperty rdf:about="&eudora;entries_workState"/>
into this:
<owl:DatatypeProperty rdf:about="&eudora;entries_workState">
<owl:equivalentProperty rdf:resource="&outlook;entries_businessState"/>
</owl:DatatypeProperty>
RDF/XML representations of OWL statements are not difficult to read, but they are verbose, which is why a tool like SWOOP makes editing them much easier.
For a more thorough job of integrating the two databases, you also define eudora:entries_firstName
as equal to outlook:entries_firstName, eudora:entries_lastName as equal to
outlook:entries_lastName, and many other equivalencies. The more relationships you can identify between the two databases, the more tightly they'll be integrated. The Bobby Fisher use case works better if you define at least these two.
To make it possible to find all of Alfred Adams' phone numbers, regardless of which ones we have stored for him, we
want to indicate that all the phone properties have some semantics in common. To do this, create a new phone property
and then make all other phone properties subproperties of that. Click the Add P button with a yellow "P" at the left
of the SWOOP workspace to display the New Entity dialog box. Set the Property Type to OWL Datatype Property, because
phone numbers are simple strings. Set subProperty-of to None, which is the first choice, because our new property
isn't a subproperty of any other. Based on the URIs that D2RQ generated from my address book, I replaced the default
Logical URI on the dialog box with http://localhost/entries/phone. I left the other fields blank. Click the dialog box's Add & Close button, and "phone" appears at the bottom of the Property Tree. If it doesn't appear, you can click Remove and start again.
After you add this new property, make sure it's selected, and then click Add next to "Superproperty of." OWL has no
"superproperty" property—when you tell SWOOP that entries:phone is a superproperty of
outlook:entries_homePhone, it stores outlook:entries_homePhone as a
subproperty of entries:phone. It's faster in the SWOOP interface to indicate that
entries:phone is a superproperty of 15 other properties than it is to go to each of those 15
and mark it as a subproperty of entries:phone. In the Specify Property dialog box,
Ctrl-click lets you select multiple properties as subproperties of the new phone property. Select the following, and
then click Add to include them in the main window and then click Cancel when you're finished with the Specify Property
dialog box. You'll see "entries_otherPhone" twice on the list, once from each of the Eudora and Outlook namespaces.
Select both.
entries_businessFax
entries_businessPhone
entries_businessPhone2
entries_carPhone
entries_homeFax
entries_homePhone
entries_homePhone2
entries_mobile
entries_mobilePhone
entries_otherPhone
entries_phone
entries_primaryPhone
entries_workMobile
entries_workPhone
Click Apply Changes and save your work.
To let Pellet discover that Bobby Fischer and Robert L. Fischer are the same person, you've already done half the
work by adding new information about the emap:entries_email1 and
omap:entries_email2Address property bridges in the mapping file. We need to have the ontology
specify that the properties used for personal email addresses are inverse functional properties, which means that
only one instance of a class can have a particular email1 value. This way, an email1 value of "bobby416@gmail.com" for
both Bobby and Robert means that they're the same person.
When using OWL DL, a property that is an inverse functional property must be an object property, not a datatype
property (that is, a URL and not a string, which is why we tweaked the mapping file for this entry the way we did).
Select "entries_email1" in the Property Tree tab and then click Add next to Attributes at the bottom of the main pane.
Select Inverse Functional and the Yes button on the Specify Property Attribute dialog box, and then click Apply
Changes at the bottom of the main pane and save your work. Follow these same steps to designate "entries_email2Address"
as an inverse functional property, and then define "entries_email2Address" as an entries_email1
to be equivalent following the same steps that you used to define the equivalent property pairs earlier.