here is no question that the web is an unprecedented success. It is the single most adventurous and useful platform for information exchange ever conceived and built. The architectural choices that went into its design have lent it scalability, flexibility, and the ability to grow into new business models, application-level technologies, and varied uses.
Currently, the web is designed for use by people; and not by software. Visitors to a site such as Amazon—that
provides book and movie ratings, information about available used copies, related products, and so forth—can
easily parse the content visually. It is much more difficult for software to act on a visitor's behalf, because the
information is tied up in the presentation structure. It is possible to write software that
scrapes these kinds of pages, but when Amazon's designers change the look or
style of the pages, the
scrapers are likely to break.
A solution to extracting information from various content, and other data integration problems, has been envisioned
for the Web from the start. This vision is known as the Semantic Web and it promotes the development of
software systems capable of sharing, integrating, and supporting machine processing of the Web's data. It is no
longer a web of documents, it is a web of data.
People have been excited and skeptical about the Semantic Web since the beginning. Early hype, a long path to working systems, disagreements about goals and strategies, and all-around confusion have left the skeptics feeling smug and self-congratulatory.
Let the skeptics have their moment. In the meantime, Semantic Web technologies are continually and quietly enriching the existing web indirectly. The skeptics might be surprised by the companies already using these technologies to solve real problems today.
RDF Where There is None
An early complaint about the Semantic Web vision was that no one would enter quality metadata. The other great criticism was that no one would ever convert their data to the Resource Description Framework (RDF) model. While these seem like reasonable critiques, in practice they are not proving true.
First of all, sites like Delicious, Flickr, and other folksonomy-based sites demonstrate that when the bar is lowered and the value is demonstrated, people will happily contribute tags and other metadata. Delicious can filter out typos and bogus tags by looking at the most common terms for a page. The challenge to the proponents of Semantic Web technologies is to make it as easy to select terms from standard and shared vocabularies as it is to type arbitrary tags.
Secondly, new technologies are eliminating the need to convert data to RDF directly. These include
Gleaning Resource Descriptions from Dialects of Languages (GRDDL),
SPARQL endpoints. GRDDL and RDFa allow RDF to be produced through standard transformations from existing XML and
XHTML resources. Simple markup, no more complicated than current presentations,
allows proper metadata to be mixed in the presentation structure and domain-specific hackery like microformats. With these tools in place and supported by certain content publishers, it will be trivial to support publication metadata, licensing information, geotagging information, and the like from the pages you visit. It is also possible to link this extracted information to different data sources for further discovery.
SPARQL endpoints allow RDF views into both RDF and non-RDF data. Some projects leverage other technologies, such as
Mulgara Semantic Store, which uses
D2RQ in its Relational Resolver to allow RDF queries to include results from non-RDF data sources. This kind of combination allows the RDF model to be populated with content from existing Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and other relational systems. There is no need to convert the data and store it as RDF; it is generated on the fly.
Linking Open Data Project
As RDF data is made available publicly on the web and in the Enterprise, it allows for technologies to create
relationships across data sources. The
Linking Open Data project has gained tremendous momentum in the past year and is now connecting billions of triples worth of data together through billions of links.
As an example, consider the thousands of Wikipedia volunteers who curate the concepts and relationships that keep
the site up-to-date and (presumably) accurate. These include facts such as that the
Louvre is a
France. These terms and relationships are now converted monthly into RDF and are exposed at
It is now possible to take a term from Wikipedia, query DBPedia for metadata about this term, and convert the
alternate names for the term and its geographic information into a Flickr query for pictures constrained to a
specific location. Following our previous example of The Louvre, you can find a slew of high-quality and related
pictures from Flickr by going
Now, imagine taking all of that information, tying together social networking sites (via OpenSocial and
Friend-of-a-Friend (FOAF) profiles), Creative Commons information, geotagging information, Dublin Core publication
metadata, the CIA Factbook, U.S. Census information, etc., and you see the emergence of a web of data.