Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


State of the Semantic Web: Know Where to Look

Those looking for evidence of progress on the Semantic Web do not have to look far. Several major projects and companies are embracing the vision and technology stack like never before.

here is no question that the web is an unprecedented success. It is the single most adventurous and useful platform for information exchange ever conceived and built. The architectural choices that went into its design have lent it scalability, flexibility, and the ability to grow into new business models, application-level technologies, and varied uses.

Currently, the web is designed for use by people; and not by software. Visitors to a site such as Amazon—that provides book and movie ratings, information about available used copies, related products, and so forth—can easily parse the content visually. It is much more difficult for software to act on a visitor's behalf, because the information is tied up in the presentation structure. It is possible to write software that scrapes these kinds of pages, but when Amazon's designers change the look or style of the pages, the scrapers are likely to break.

A solution to extracting information from various content, and other data integration problems, has been envisioned for the Web from the start. This vision is known as the Semantic Web and it promotes the development of software systems capable of sharing, integrating, and supporting machine processing of the Web's data. It is no longer a web of documents, it is a web of data.

People have been excited and skeptical about the Semantic Web since the beginning. Early hype, a long path to working systems, disagreements about goals and strategies, and all-around confusion have left the skeptics feeling smug and self-congratulatory.

Let the skeptics have their moment. In the meantime, Semantic Web technologies are continually and quietly enriching the existing web indirectly. The skeptics might be surprised by the companies already using these technologies to solve real problems today.

RDF Where There is None
An early complaint about the Semantic Web vision was that no one would enter quality metadata. The other great criticism was that no one would ever convert their data to the Resource Description Framework (RDF) model. While these seem like reasonable critiques, in practice they are not proving true.

First of all, sites like Delicious, Flickr, and other folksonomy-based sites demonstrate that when the bar is lowered and the value is demonstrated, people will happily contribute tags and other metadata. Delicious can filter out typos and bogus tags by looking at the most common terms for a page. The challenge to the proponents of Semantic Web technologies is to make it as easy to select terms from standard and shared vocabularies as it is to type arbitrary tags.

Secondly, new technologies are eliminating the need to convert data to RDF directly. These include Gleaning Resource Descriptions from Dialects of Languages (GRDDL), RDFa, and SPARQL endpoints. GRDDL and RDFa allow RDF to be produced through standard transformations from existing XML and XHTML resources. Simple markup, no more complicated than current presentations, allows proper metadata to be mixed in the presentation structure and domain-specific hackery like microformats. With these tools in place and supported by certain content publishers, it will be trivial to support publication metadata, licensing information, geotagging information, and the like from the pages you visit. It is also possible to link this extracted information to different data sources for further discovery.

SPARQL endpoints allow RDF views into both RDF and non-RDF data. Some projects leverage other technologies, such as Mulgara Semantic Store, which uses D2RQ in its Relational Resolver to allow RDF queries to include results from non-RDF data sources. This kind of combination allows the RDF model to be populated with content from existing Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and other relational systems. There is no need to convert the data and store it as RDF; it is generated on the fly.

Linking Open Data Project
As RDF data is made available publicly on the web and in the Enterprise, it allows for technologies to create relationships across data sources. The Linking Open Data project has gained tremendous momentum in the past year and is now connecting billions of triples worth of data together through billions of links.

As an example, consider the thousands of Wikipedia volunteers who curate the concepts and relationships that keep the site up-to-date and (presumably) accurate. These include facts such as that the Louvre is a museum in Paris, France. These terms and relationships are now converted monthly into RDF and are exposed at DBPedia.

It is now possible to take a term from Wikipedia, query DBPedia for metadata about this term, and convert the alternate names for the term and its geographic information into a Flickr query for pictures constrained to a specific location. Following our previous example of The Louvre, you can find a slew of high-quality and related pictures from Flickr by going here.

Now, imagine taking all of that information, tying together social networking sites (via OpenSocial and Friend-of-a-Friend (FOAF) profiles), Creative Commons information, geotagging information, Dublin Core publication metadata, the CIA Factbook, U.S. Census information, etc., and you see the emergence of a web of data.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.