The specifications behind the Semantic Web provide the ability to encode, link, and reason about data.
Historically, it has been impossible to characterize an unqualified URL as a document or a reference to a
non-network-addressable resource. The W3C Technical Architecture Group (TAG) has recently reached a decision that
these non-networked resources can be given URIs/URLs. An infrastructure that resolves these references can indicate
the special status by returning an HTTP response code of 303 (See Other) instead of a
Online Computer Library Center's (OCLC)
Persistent URL infrastructure was recently rearchitectured for scalability and a series of new features, such as
support for this 303 guidance. This
new version lays down some key infrastructure for assigning good, resolvable names for terms and concepts, something that has been sorely missing in the Semantic Web technology stack. As such, the new system can define concepts to disambiguate RDF subjects. URIs can be given to proteins, people, legislation, places, etc. While historically you may have chosen a pseudo-canonical URL from a site such as Wikipedia, now it is possible to define a new canonical URL for the terms and subjects that are of interest to your organization.
Embeddable Semantic Web Applications
Thomson Reuters runs a free site, called
OpenCalais, for identifying terms and concepts from within unstructured text. With plugins such as
Gnosis for FireFox, it is possible to turn the OpenCalais service directly on the pages you visit to identify
people, places, organizations, industries, etc., even on sites that do not publish information with support for
GRDDL, microformats, and RDFa. These extracted terms can then be linked back into other data sources to automate the
process of extracting information as you surf the Web. This service is a step toward a larger vision.
Thomson Reuters' CEO has even caught Semantic Web skeptic Tim O'Reilly's attention with his vision of where this
Another FireFox plugin,
Solvent from the Simile project, makes it easy for you to compose lightweight and shareable screen scrapers to extract content from arbitrary pages. This highlights that, while it is great when sites support Semantic Web technologies, the success of the vision does not require everyone to get on board. Automated and semi-automated extraction are key approaches to linking content in structured and unstructured forms.
Support By Open Source and Commercial Organizations
One of the major barriers to adoption of semantic technologies is the lack of support in software. There have
always been quality parsing, producing, and querying APIs, but major software initiatives have in general taken a
wait-and-see approach. This is increasingly becoming less of an issue as major open source initiatives such as
Mozilla have committed to supporting RDF and SPARQL.
Perhaps more valuable than adoption by Open Source projects is the long-anticipated support for the technologies by major commercial software players. This too has finally come to pass. Oracle was one of the first major vendors to adopt RDF and OWL in its database engines. It cleverly co-opted its existing Spatial Engine (with its network data model) to support the graph models of RDF. It is now possible to mix RDF and non-RDF data within the same database engine.
Industry giants Yahoo! and Microsoft have also been making announcements and acquisitions in this space. Google is
promoting interoperability in the social networking world through
Open Social while MySpace, eBay, Twitter, and Yahoo! are pursuing
New technology companies have emerged along the way with tools to help developers, knowledge workers, and other
organizational stakeholders build software systems around these ideas.
TopQuadrant's TopBraid Suite,
Franz's AllegroGraph, the
Thetus Publisher, and
OpenLink's Virtuoso server are among the leaders of these emerging markets.
Semantic Arts, and
Sandpiper Software are working with major corporations around the world to adopt these ideas within their organizations with training, strategic guidance, and implementation assistance.
The pain of failed Enterprise Application Integration (EAI) and Service-Oriented Architecture (SOA) initiatives are
driving financial services, news media, insurance, and other conservative industries to look for new solutions to
its IT needs. Those industries are considering the successes of the web and want to know how to adopt those ideas internally.
The Cleveland Clinic is a leader in adopting Semantic Web technologies to improve their ability to meet the
needs of their patients. The goals of the clinic are to lower their IT costs, add business functionality, and avoid the technology flux treadmill. The clinic's goal is not to use semantic technologies per se; it's to use them as viable solutions.
Learning About Semantic Technologies
Developers, managers, and executives can learn about Semantic Web technologies at major conferences. This year, relevant content has appeared at:
New books are being published to help people navigate these technologies, including
modeling with OWL
Semantic Web technologies are here in many important ways, and you are most likely using these technologies on a daily basis; even if it's an indirect usage. The success of these technologies is not simply a question of everyone adopting the same models and the same terms; it is about a rich and vibrant ecosystem of data, documents, and software tied together in useful ways.