devxlogo

Discover Microformats for Embedding Semantics

Discover Microformats for Embedding Semantics

he Resource Description Framework (RDF) and Web Ontology Language (OWL) are important technologies driving development on the road to the semantic web. The former is a set of World Wide Web Consortium (W3C) specifications that provide a model for representing metadata through specific statements, or triples, made up of subject-predicate-object representations for specific resources. Data from disparate stores can then be mashed together or built into resources of machine-readable information, which can be processed, exchanged, and stored by web-based applications. The latter technology, OWL, is currently a W3C recommendation for a language that can be applied to define and represent data models more effectively than other metadata languages, such as XML, through the use of semantics and a vocabulary that provide class and property descriptions.

These and other technologies that are gaining more widespread adoption to build out the semantic web promote highly complex concepts and analytics, and the ability for the semantic web to extract, process, and deliver intelligent information efficiently is going to take time for developers and designers to refine and implement. Despite the learning curve, many people building applications that aim to provide Web 3.0 capabilities will need to begin embedding semantics now to get a head start delivering more meaningful, information-based content to customers and users. And one way to embed semantics for selected applications is to use microformats.

Microformats provide extensions to the standard HTML tags that have been used widely for some time to create web pages, and they are open and freely available elements for semantic-based markup in HTML or Extensible HTML (XHTML). Consider microformats as a means of lowering the barrier to your entry in semantic web development. They can even give web page designers without extensive programming experience the ability to program web sites. This discussion will take a look at microformats and demonstrate how they can be a convenient stepping-stone for developers and designers looking to participate in the evolution of the semantic web.

Embedded Semantics
Rather than reinvent the web, microformats allow developers to approach the problem of embedding semantics from the perspective of existing and widely adopted web standards. Microformat-aware browsers can parse this code and use it to help extract meaning from web pages. Standard HTML markup describes how only text should be formatted. Microformats allow programs such as web crawlers to recognize items like contact information, events, and so on, which can be added to address books and calendars. Microformats also provide you the ability to aggregate content or create “mashups,” such as adding a restaurant review to a MapQuest map.

Although they are in effect an attempt to turn a medium designed for publishing and presentation into something that is dynamic and programmable, it is important that microformats be designed for humans first and for machines second. Microformats should therefore be human readable and easily understood by content authors and designers as well as more experienced programmers. One way to think about microformats is that they are about people, events, places, and things, rather than just pages.

Think about the last developer conference you attended. Wouldn’t it have been easier to manage your workshop schedule if the agenda on the conference web site could have found its way directly into the calendar on your laptop or PDA? Microformats can enable this kind of scenario.

From a technical point of view, microformats are a form of semantic markup using standard XHTML encoded with specific HTML attributes such as class, rel, and rev. What otherwise would be seen by a machine as just text, gains meaning through its context as indicated by wrapping items in span (or other HTML) elements with class names that are part of a specific microformat specification. A set of class names?for example, formal name (fn), organization (org), telephone number (tel), and url?constitutes a class (in this case, class=”vcard”), which can be understood as a single, specific entity for the purposes of data exchange. Once detected, this information can be extracted by software and reused (indexed, searched for, saved, or cross-referenced). When encoded in this way, it can be used either by web services in a more programmatic manner or be imported into desktop applications.

Microformats are not unlike using XML tags, but there is one important difference: instead of allowing everyone to create their own custom XML tags, microformats are derived from existing web standards as much as possible. For example, hCard maps one-to-one to the vCard standard that has been in use for years by desktop applications like Microsoft Outlook and Apple’s iCal.

Having this microformats markup based on easily understood and widely adopted standards gives authors of microformats the hope to speed adoption and provide a more generalized form of markup rather than the myriad industry-specific forms of XML.

It’s All in the Metadata
Existing web standards, when used appropriately, are able to convey a great deal of semantic meaning. XHTML tags such as

, , and

, and attributes such as rel, rev, and title, are used to create blocks of code that can be retrieved and reused as needed. As XHTML evolves and additional schemas emerge, it will become even easier to embed semantics in web pages. For example, the microformat rel=”nofollow” can be added to an anchor tag to indicate that search engines should ignore those links tagged explicitly this way.

Microformats can also be used to increase relevancy by giving search engines the ability to parse and act on metadata. Microformat-aware browsers will be able to parse this data, making it far easier to integrate the web with desktop applications that make use of it.

Good metadata has always been the key to the accuracy and efficiency of search engines; however, today we often find ourselves at the mercy of legacy approaches where no extra effort was made or required to build metadata into the markup. The use of microformats improves this situation to a certain extent by tying metadata to the markup itself. With explicit, rather than implicit, semantics the search engine’s job becomes easier.

Today’s web sites and rich Internet applications (RIAs) are also collaborative efforts in which the person who built the site may not be the same person maintaining it. Human-readable formats minimize the need to comment the code and allow web sites to be maintained more easily.

Microformats also provide an alternative to “entity recognition techniques,” which attempt to understand such “entities” as a postal address based on linguistic grammar or statistical models by explicitly describing these entities ahead of time. While this approach requires that pages be marked up to provide a search engine with such semantics, it is far less error prone.

Because they are based on standard XHTML, microformats can provide valuable information such as a name or address to a web crawler without the web crawler being aware of it. However, as web crawlers begin to adjust and accommodate microformats, a number of advanced scenarios become possible.

Consider a search for “New York pizza.” There’s some useful data there, but without semantics, there’s no way to know whether the user is searching for thin crust-style New York pizza in another city, such as San Francisco, or for a pizza parlor in New York City. Because microformats can be used to mark up specific attributes, such as a type or a location, they are able to make such distinctions.

Instead of relying solely on the search algorithms employed by a search engine like Google, which may or may not return relevant results, microformats provide a decentralized approach to a search that treats the entire web as one massive, structured database. Individuals can mark up such things as their review of the place they ate at last night on their blog, and it will show up when someone searches for reviews of that restaurant. With appropriate markup, the restaurant’s address can be added to an address book with a single click. This decentralization gives search engines the ability to leverage the collective intelligence of a community of web designers in a way that just wasn’t possible before.

Markup Without Complexity
Because XHTML is built on XML, it is useful not just for displaying web pages, but also for general-purpose data exchange. Microformats are much simpler to implement than complex semantic technologies such as RDF and OWL, but there are some basic recommended guidelines and standard practices to be aware of:

  • Microformats should be used to solve a specific problem, such as making contact information actionable.
  • They should be as simple as possible and designed for humans first and machines second.
  • As much as possible, you should reuse building blocks from existing and widely adopted web standards.
  • Keep in mind the general principles of modularity, the ability to embed, and decentralized development.

For a better understanding of how microformats can be used to enable semantic markup for particular types of information, take a closer look at a few of the more useful established types.

Putting Microformats to Work
The hCard type is used for contact information and is based on the vCard standard. By providing structure to news stories and blog posts, which often discuss people, it is possible for web crawlers to retrieve this information and automatically convert it to a vCard, which can be used in a vCard-aware application such as Apple’s Address Book. This conversion makes it easy for people to populate their address books and build contact lists. By applying cascading styles sheets (CSS) to hCards you can make them appear however you wish and publish them directly on web pages and blogs. Here’s a sample:

BEGIN:VCARDVERSION:3.0N:Sherman;LeeFN:Lee ShermanURL:http://www.devx.com/ORG:JupitermediaEND:VCARD

Using the hCard microformat markup, the entry becomes:

Lee Sherman
Jupitermedia

Here is a more detailed example:

Jupitermedia150 Executive Park Blvd., Suite 4100San Francisco, CA 94134USAPhone: +1-415-555-1234Email: [email protected]: +1-415-555-6789

Where the markup is:

Jupitermedia
150 Executive Park Blvd Suite 4100
San Francisco, CA 94134
USA
Phone: +1-415-555-1234
Fax: +1-415-555-6789

The hCalendar type is used for events and is based on the iCalendar standard. This type is perhaps even more useful than hCard in that it provides a way for desktop-bound calendars such as Outlook on Windows or Apple’s iCal to take advantage of the various online calendars and events listings published by numerous event producers, independent publications, and corporate entities. Web crawlers can automatically convert hCalendar listings to iCalendar listings, where they can then be inserted programmatically into desktop applications.

In addition, by marking up event notices on web pages and blogs, content authors can provide users with an easy way to extract this information, eliminating the need to retype it or copy and paste it into a calendar application. As with hCard, CSS can be used to specify the appearance of an online calendar. For example, an equivalent event:

BEGIN:VCALENDARPRODID:-//XYZproduct//ENVERSION:2.0BEGIN:VEVENTURL:http://www.web2con.com/DTSTART:20071005DTEND:20071020SUMMARY:Web 2.0 ConferenceLOCATION:Argent Hotel, San Francisco, CAEND:VEVENTEND:VCALENDAR

would look like this in the hCalendar microformat:

http://www.web2con.com/Web 2.0 Conference: October 5-19,at the Argent Hotel, San Francisco, CA

Those are two of the most common types of microformats?contacts and events?but with the arrival of Web 2.0, these days most web pages include rich media types such as the videos found on YouTube or specific types of content such as the movie reviews found on Rotten Tomatoes. Wouldn’t it be nice if you could encapsulate this information to be extracted for use in other applications? This kind of encapsulation and extraction is possible with microformats in certain applications; however, it’s important to note that not every media type or content type has a corresponding microformat available currently.

Everyone’s a Critic
A proposed microformat called hReview is designed to provide a way for users to share, distribute, syndicate, and aggregate reviews. The hReview type is designed around a minimal schema that the authors hope will be simple yet flexible enough to accommodate various kinds of reviews for books, movies, restaurants, and so forth on the web today. This application is challenging because review formats vary greatly. Yet, most reviews share some common fields, making it possible to come up with a schema based on this subset.

Here’s an example of how hReview might be used, as taken from the wiki at Microformats.org. The HTML code for a product review looks like this:

Album cover photo: The Postal Service: Give Up. The Postal Service: Give Up

"The people thought they were just being rewarded for treating others as they like to be treated, for obeying stop signs and curing diseases, for mailing letters with the address of the sender.... Don't wake me, I plan on sleeping in..."

"Nothing Better" is a great track on this album, too...

(*****)

Adding hReview to this review requires a few more elements for the rating and reviewer:

Album cover photo: The Postal Service: Give Up. The Postal Service: Give Up

"The people thought they were just being rewarded for treating others as they like to be treated, for obeying stop signs and curing diseases, for mailing letters with the address of the sender.... Don't wake me, I plan on sleeping in..."

"Nothing Better" is a great track on this album, too...

(*****)

Review by Adam Rifkin,February 2005

And this hReview might be presented like this:

[Album cover photo: ] [The Postal Service:] [ Give Up ] The Postal Service: Give Up "The people thought they were just being rewarded for treating others as they   like to be treated, for obeying stop signs and curing diseases, for mailing   letters with the address of the sender... Don't wake me, I plan on sleeping   in..." "Nothing Better" is a great track on this album, too... (*****) Review by Adam Rifkin, February 2005. 

Besides the microformat types discussed here, there are several others that have been designed to enable semantic markup for specific types of information:

  • hAtom (hAtom spec.) ? for marking up Atom feeds from within standard HTML
  • hResume (hResume spec.) ? for résumés or CVs
  • rel-directory (rel-directory spec.) ? for creating and including distributed directories
  • rel-nofollow ? an attempt to discourage third-party content spam (for example, spam in blogs)
  • rel-tag (rel-tag spec.) ? for decentralized tagging (Folksonomy)
  • xFolk (xFolk spec.) ? for tagged links
  • XFN ? for social relationships
  • XOXO ? for lists and outlines

As mentioned previously, the microformats specifications are a community effort, and they are open and widely available. Support for microformats is growing. They are expected to be supported natively by forthcoming browsers such as Firefox Version 3 and Microsoft Internet Explorer 8. Developers who want to get an early start on using microformats can make use of the Firefox extension Operator, which is able to detect common microformats such as hCard, hCalendar, geo, hReview, and rel-tag.

Though they are relatively new, microformats are gaining support from Google and Yahoo! and are already in use by such web services as Yahoo! Local, Flickr, and Upcoming.org. They represent the next stage in the evolution of the web by moving beyond web sites as static “brochureware” and providing a simple approach to building semantics into current web site design and development. As the industry shifts from the original web and more recent dynamic Web 2.0 technologies to the coming Web 3.0 by way of semantic technologies, we should see the Internet provide the ability to work with web content and behave as if it is all implemented in one enormous application that anticipates the needs of its users and acts accordingly. Microformats can help you get there from here by providing a simple way to embed semantics in web pages.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist