here are many common problems that occur when corporations expand beyond their initial ontologies and start to build multiple ontologies that must remain consistent. Here is a list of the common problems, which are covered in this article:
- Tools – Use the right tools to build your ontology.
- Duplicating Data Elements – What do you do when two separate structures in your ontology represent the same concept?
- Role Pollution – What do you do when the role of a person or object becomes the class name?
- Mixing Processes for Semantics and Constraints – Learn how to use a single process for meaning and exchange-specific constraints.
- Untested Upper Ontologies – What do you do when critical upper ontology classes do not work as they were designed?
- Ambiguous Definitions – Learn how to write precise definitions for classes, properties, and values.
- Mixing Definitions and Descriptions – Definitions are critical because they get high visibility in many tools.
- Poor Search – Users need to find what they are looking for?especially if it already exists.
- Poor Reporting – Find all unapproved properties in a project that help you prioritize your work.
- Lack of Versioning and Traceability – Knowing who created a property and in what context can help you determine the intended purpose of a property.
- Lack of Code-Level Semantics – Knowing the meaning of classes and properties is necessary but not sufficient. Knowing the enumerated values of codes used in properties is just as critical.
Using Tools to Help Design Ontologies
There are many products today that claim to allow you to design Ontologies. Stanford University’s widely used Open Source Protégé ontology editor (see Figure 1) or Altova’s SemanticsWorks (see Figure 2) are both good examples of ontology design tools.
- Track document history
- Track versioning
- Search for data
- Create reports of what documents were created by what individuals
- View timelines of when groups of data were created
Central to many of these tools is the creation of smaller discrete structures that bear semantic information but are reused in many ontologies. But in large organizations, shared meaning only comes through shared trust. If people do not trust the processes behind your ontology they will not use it and they will tend to re-invent the structures.
|Author’s Note: The term “data element” within this article refers to fine-grain structures that can be managed in an ontology. If you are familiar with OWL, these items include classes, properties, relationships and range values.
Duplicating Data Elements
One of the first tests of a high-quality ontology is to look for duplication of data elements. The larger an ontology grows, the higher the probability that an untrained user will enter a new data element that already exists in your corporate ontology. Untrained users usually do this because they are not aware that it already exists in the ontology. If your ontology management system has a search tool, then you can always train new users on how to use these tools. But searching tools alone are usually not enough. For example, a user might search for a term such as “Individual” not knowing that you stored information about human beings under the class “Person.” Using keywords and synonyms is one way to ensure that search tools find and display the data elements a user needs.
The second line of defense against accidental duplication of data elements is a human-centric review process. Most ontologies have a few expert users that are familiar with the structure and conventions used in an ontology. When a novice user adds a new data element to an ontology an e-mail or other notification message can be sent to experts alerting them that new data elements are pending their review.
In large standards this review process usually is directed to a committee of experts who have a specialized understanding for specific parts of the ontology. In financial institutions some members might specialize in stock transactions and some in bonds. The key is to have a clearinghouse to assign data elements to the group that has the most expertise. This is one of the central aspects of data governance and data stewardship that must be in place for the ontology to gain enterprise respect and usage.
A common oversight a novice ontologist can make is mistaking a role for an actual object. For example, a person may play many roles in a business event. In healthcare, a person might play the role of a patient, a nurse, a physician or an office assistant. It seems obvious at first to take a form that has the label “PatientName” and create a property of patient-name in your ontology. Then you might add physician-name, and nurse-name and office-assistant-name. The key is to realize that these labels on the medical forms reflect the role that a “Person” plays in the business event. To create precise rules around names, you want to remove the role from the name and create subclasses of Person for each of these roles. The PersonGivenName and PersonFamilyName can then be properties of Person and your business rules for validating these names can be shared. You can then create a PersonRoleCode and assign the role to successive values of PersonRoleCode=”Patient”, etc. Removing roles from properties is one of the best ways to keep your ontologies reasonable.
Mixing Processes for Semantics and Constraints
When designing web services you typically create XML Schemas to validate incoming data elements. You can automate the process of creating XML Schemas by extracting a sub-set of the elements from your ontology. You can transform these subsets from OWL and RDF directly into XML Schema files that are imported into other XML Schemas or WSDL files.
|Figure 3. Mapping Ontology into an XML Schema:This figure shows an example of an XML Schema diagram.
In the past, the process of creating an XML Schema was used to define the meaning of data elements. And if the XML Schemas changed the definitions of the data elements then the constraints of a specific XML Schema structure became mixed up. See Figure 3 for an example of an XML Schema diagram. This is where your ontologies come in. You can use ontologies as a central location to store the semantics or meaning of the data elements that live on the leaves of XML documents. When they are stored in a well-controlled centralized corporate ontology the definitions of the data elements go beyond the needs of a specific version of a web service. The data elements have individual histories:
- Creation dates
- Approval workflow status
- Approval committees
- Revisions and date-stamps of when they were approved for corporate usage
On the other hand, you should not view XML Schemas as containers of the semantics of data elements. XML Schemas are containers of the data elements and each one expresses the order and cardinality of the collection of elements. XML Schemas are the constraints of a specific data exchange. For example, a single developer can add and delete Web services for data subscribers. Your job as a corporate ontologist is to support such activities, maintain semantics, and get out of the way of a specific business unit that has their own instance of specific required fields, which must be present as inputs to their web services.
So when people have questions about the meaning of data: that is the ontologist’s signal to step in and bring the tools to build semantic precision. But if a problem has to do with what elements are present, what order they appear in a transaction, which data elements are required, and which ones are optional; it is recommended to let the data publisher and subscriber try to work things out.
Untested Upper Ontologies
Upper ontology classes are some of the most critical parts of your corporate ontology. These upper ontology classes are the root classes that are either direct subclasses of the OWL Thing class or they are second-level subclasses of Thing. Teams frequently get in heated arguments about the pros and cons of these upper ontologies and there are many complex tradeoffs about the depth and breath of these upper ontologies. Some of these issues are worthy of long discussion because they have long-term impact. Computer systems that have similar upper ontologies will have much lower integration costs. If upper ontologies are stable then people will develop trust in the systems. They are the anchors of your semantics and the foundation of your building. Change them frequently and you will quickly lose the trust of your stakeholders. The first myth about upper ontologies is that it is impossible to “test” the usefulness of the upper ontologies. This is simply not the case. Here is a simple method to test your upper ontologies:
- Create a simple one-page handout that describes your upper ontologies.
- Give each class a label and a short description.
- If necessary, provide short explanations of what types of subclass and properties will be placed under this class.
- Then take a list of around 100 subclasses and properties and ask a group of around 10 business analysts to classify each of the subclasses and properties using one of your upper ontologies.
- If each business analyst classifies each subclass or property according to how you designed the ontologies you have a winner.
- If they are not consistent you need to go back to the drawing board and look at your ontology again.
An upper ontology is like a high-level sieve. Data elements come pouring out of requirements like little grains of sand and need to be sorted correctly by the “uppers.” Even a novice that is unfamiliar with your ontology should have the ability to guess how the data elements are sorted.
Repeat the testing process until approximately 95 percent of all data elements are correctly sorted into the correct subclass. If you do this you can confidently tell your management team that the ontology is not just a personal interpretation of how elements should be classified: it is based on a repeatable testing process.
Almost every project seems to have a few wonderful “wordsmiths” who can help you write great data element definitions. These are the people that still have an old dog-eared copy of a dictionary in their office. They tend to love to read, they have a love for words, they speak with precision and are keen observers of how other people use words to discuss complex topics. These are the people you want on your team to help write your data element definitions. Here is a summarized list of five characteristics for great data element definitions:
- Precise – The definition should use words that have a precise meaning. Try to avoid words that have multiple meanings or multiple word senses.
- Concise – The definition should use the shortest description possible that is still clear.
- Non Circular – The definition should not use the term you are trying to define in the definition itself. This is known as a circular definition.
- Distinct – The definition should differentiate a data element from other data elements. This process is called disambiguation.
- Unencumbered – The definition should be free of embedding rationale, functional usage, domain information, or procedural information.
Once you have a great definition, make sure that every class, property, range value and all derived artifacts carry the definitions with it. It is disappointing to open an OWL file, an XML Schema, or a relational database only to see that none of the tables or columns have any definitions and you are left to guess at the meaning.
Mixing Definitions and Descriptions
|Figure 4. Ontology Mapping:This figure shows an example of ontology mapping.
Each data element in your ontology needs to have a short definition that appears next to a graphical representation of the data element. This short definition needs to be clear and concise so that it is different enough from similar data elements in your ontology and to clarify the semantics of the data element. Many ontology tools or XML Schema mapping tools (see Figure 4) can display definitions next to a graphical representation of the data element. Some tools display a definition if the user hovers over a data element with a mouse. However, you should not write a definition longer than one or two lines. Definitions longer than two lines should be placed in a descriptive note in your ontology management system. This can include detailed discussions about usage, exceptions, and departmental-specific business rules on the use of the data element.
Many tools allow you to click on a hypertext link to open a full-page description of a data element including facts like who added it, when they added it and what systems might be impacted if this data element changes. This feature is called the data element traceability and it is critical to help people understand your ontology and therefore trust its credibility.
Already discussed were the problems of duplication of data elements and how search tools are necessary as ontologies grow and multiple ontology designers are involved. This section covers some specific problems related to searching ontologies and related assets. One of the first problems with ontologies is that they tend to be highly structured documents with complex relationships. You cannot easily break an ontology into a simple collection of classes, properties and value domains and then put these items into three tables in a relational database. Although you can create fast searches using standard relational databases you will find that storing ontologies in relational structures is just not flexible enough.
A much more practical approach is to store ontologies in a native XML database structure. This allows you to keep the complex structures in an ontology but still allows you to perform complex full-text searches. Some native databases such as the open source eXist database also have WebDAV interfaces that make adding ontologies as simple as a drag-and-drop into a folder. Ontologies are then automatically indexed and can be searched using standards such as XQuery. Open source reporting tools such as the Eclipse BIRT plugin can be used to create high-quality reports.
Lack of Versioning and Traceability
The need for versioning of any asset is critical for building long-term enterprise trust in that the asset was thoughtfully created by a trained team member and approved by a review team that represented stakeholders across the enterprise. One of the best practices is to use a web-based version control system such as subversion to store ontologies and view the histories of these ontologies. Subversion has many supporting tools that provide user-friendly colored differences, which show what changes were made by what users and when. These tools also allow developers to be more aggressive about removing duplicates when they know that these removals can be quickly undone.
Lack of Code-Level Semantics
Many of the glamorous parts of creating an ontology are creating the highly visible high-level classes that are used for multiple years across an enterprise. But there are many not-so-glamorous and not-so-visible parts of creating highly-precise ontologies that are critical for system interoperability. One of these is the value domains of properties. For example, each state in the U.S. has a two-letter state code. You may have a business rule that indicates a state-code must use one of the correct state codes. And to be complete an ontology must store:
- Each of these codes
- Creation dates
- Extended properties, including:
- Who created the code
- How it is distinct from other codes
- If a code was depreciated in a searchable structure
Documenting precise semantics for each of the codes in your system can comprise over half of the work in an ontology and reviewers need to review each of the codes with the same process that goes on for many other data elements. This is the drudgery work of building an ontology but one that gives any enterprise project credibility.
Taking a Leading Role in Your Organization
|Figure 5. Data Element Approval:To help your organization, create a shared process for defining data element approval.
Many organizations empower enterprise ontologists to become the keeper of semantic precision. Ontologists can become the core team that helps an organization:
- Enforce consistent semantics for shared business rules
- Create a shared process for defining data element approval (see Figure 5)
- Create shared meaning of conformed dimensions in a data warehouse
- Create consistent product taxonomies
- Create consistent integration maps that map database systems to web services
- Create consistent leaf-level data elements that move between any two computer systems
This precision and consistency of ontologies allows organizations to save time and money building complex systems. Tools to find trusted data elements quickly allow these organizations to be agile. Together ontologists and semantic web technologies can play a leading role in large-scale enterprise costs savings.