advertisement
Premier Club Log In/Registration
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   SKILLBUILDING  |   TIP BANK  |   SOURCEBANK  |   FORUMS  |   NEWSLETTERS
Browse DevX
Installing NetKernel and the Proxy Example
Partners & Affiliates
advertisement
advertisement
Rate this item | 0 users have rated this item.
 Print Print
 
Gleaning Information From Embedded Metadata
Put GRDDL-enabled agents to the task of extracting valuable information from machine-processable metadata embedded in documents, courtesy of prevailing semantic web standards. 

advertisement
ne of the fundamental visions of the semantic web is the ability to provide improved technologies for machine-processable data. The current web is a swell place for people, but absent a series of open, global standards for metadata, it is difficult to imagine the interoperability necessary to link software, data, and documents in all their various forms. (Note that only the standards are required to be global; the specific terms, relationships, and concepts can be as diverse as the communities they reflect.)

While the vision eventually spirals off into spiders, agents, and bots, you do not have to go quite that far to imagine how vitally useful this automated data processability will be. Right now, the only real metadata available everywhere is the address of the documents you browse and the date and time when you did so. You can collect this data in your browser or on sites like del.icio.us to find them at a future point through tags that you create. You are externalizing the metadata about the document either into a taxonomy (that is, browser history menus) or through keyword tags. It is the browsing experience that directly provides the where and when.

These two fundamental pieces of information are important, but the entire spectrum of expressible metadata offers a compelling promise of semi-automated data gathering that is just now beginning to be appreciated. Imagine passively tracking who wrote each of the pages you visit and where these authors work and live, what they are interested in, and who they know. Consider the efficiency of looking back at what you have perused and being told which documents are Creative Commons licensed in ways that allow you to directly mine what you have read as long as you attribute accordingly. Or, how about hitting a band's web page and capturing when they are going to be playing in your town?

One of the biggest complaints about this vision, however, is that critics do not believe people will be willing to put in the effort to produce and maintain quality metadata. Their complaint is that without a solid foundation, the whole house of cards will fall or fail to emerge in the first place. While sites like del.icio.us, Flickr, and similar folksonomy-based approaches—and the rampant success of Atom/RSS feeds—seem to disprove these concerns, for the purposes of this article the assumption is that at least some publishers will be willing to do so.

The question is, how do you go about embedding this information into your web pages?

HTML and XHTML traditionally have had only modest support for metadata tags. They also have structural guidelines that make directly adding metadata more difficult than you might expect. Historically, developers and publishers have played some clever games to put domain-specific metadata into HTML by using microformats—for more information on microformats, see the article, "Discover Microformats for Embedding Semantics" (DevX, July 4, 2007). While useful, these specific formats fail to support an open-ended metadata language like the Resource Description Framework (RDF), which allows the use, reuse, and mixture of open-ended vocabulary spaces. You cannot ignore microformats, and you shouldn't, because they have been adopted successfully and extensively, but they simply do not paint a complete picture.

The World Wide Web Consortium (W3C) is working on including richer metadata support in HTML/XHTML with emerging standards such as RDF with attributes (RDFa), embedded RDF (eRDF), and so on. These standards allow more specific metadata to be attached to different structural and presentation elements, which provides a unified information resource. Avoiding data duplication or forking information resources into text/data and metadata are key goals of these efforts, which are currently in the works and will likely result in very compelling strategies to solve this problem. Now it is time to take a look at what is available and widely usable.

Page 1 of 4
advertisement
  Next Page: GRDDL Support
Page 1: IntroductionPage 3: Configure the Proxy
Page 2: GRDDL SupportPage 4: Transform Discovery
advertisement
Advertising Info  |   Member Services  |   Permissions  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About


JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
Microsoft Article: Will Hyper-V Make VMware This Decade's Netscape?
Microsoft Article: 7.0, Microsoft's Lucky Version?
Microsoft Article: Hyper-V--The Killer Feature in Windows Server 2008
Avaya Article: How to Feed Data into the Avaya Event Processor
Microsoft Article: Install What You Need with Windows Server 2008
HP eBook: Putting the Green into IT
Whitepaper: HP Integrated Citrix XenServer for HP ProLiant Servers
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 1
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 2--The Future of Concurrency
Avaya Article: Setting Up a SIP A/S Development Environment
IBM Article: How Cool Is Your Data Center?
Microsoft Article: Managing Virtual Machines with Microsoft System Center
HP eBook: Storage Networking , Part 1
Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Intel Video: Are Multi-core Processors Here to Stay?
On-Demand Webcast: Five Virtualization Trends to Watch
HP Video: Page Cost Calculator
Intel Video: APIs for Parallel Programming
HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Sun Download: Solaris 8 Migration Assistant
Sybase Download: SQL Anywhere Developer Edition
Red Gate Download: SQL Backup Pro and free DBA Best Practices eBook
Red Gate Download: SQL Compare Pro 6
Iron Speed Designer Application Generator
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
How-to-Article: Preparing for Hyper-Threading Technology and Dual Core Technology
eTouch PDF: Conquering the Tyranny of E-Mail and Word Processors
IBM Article: Collaborating in the High-Performance Workplace
HP Demo: StorageWorks EVA4400
Intel Featured Algorhythm: Intel Threading Building Blocks--The Pipeline Class
Microsoft How-to Article: Get Going with Silverlight and Windows Live
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES