It's All in the Metadata
Existing web standards, when used appropriately, are able to convey a great deal of semantic meaning. XHTML tags such as <address>
, and <blockquote>
, and attributes such as rel
, and title
, are used to create blocks of code that can be retrieved and reused as needed. As XHTML evolves and additional schemas emerge, it will become even easier to embed semantics in web pages. For example, the microformat rel="nofollow"
can be added to an anchor tag to indicate that search engines should ignore those links tagged explicitly this way.
Microformats can also be used to increase relevancy by giving search engines the ability to parse and act on metadata. Microformat-aware browsers will be able to parse this data, making it far easier to integrate the web with desktop applications that make use of it.
Good metadata has always been the key to the accuracy and efficiency of search engines; however, today we often find ourselves at the mercy of legacy approaches where no extra effort was made or required to build metadata into the markup. The use of microformats improves this situation to a certain extent by tying metadata to the markup itself. With explicit, rather than implicit, semantics the search engine's job becomes easier.
Today's web sites and rich Internet applications (RIAs) are also collaborative efforts in which the person who built the site may not be the same person maintaining it. Human-readable formats minimize the need to comment the code and allow web sites to be maintained more easily.
Microformats also provide an alternative to "entity recognition techniques," which attempt to understand such "entities" as a postal address based on linguistic grammar or statistical models by explicitly describing these entities ahead of time. While this approach requires that pages be marked up to provide a search engine with such semantics, it is far less error prone.
Because they are based on standard XHTML, microformats can provide valuable information such as a name or address to a web crawler without the web crawler being aware of it. However, as web crawlers begin to adjust and accommodate microformats, a number of advanced scenarios become possible.
Consider a search for "New York pizza." There's some useful data there, but without semantics, there's no way to know whether the user is searching for thin crust-style New York pizza in another city, such as San Francisco, or for a pizza parlor in New York City. Because microformats can be used to mark up specific attributes, such as a type or a location, they are able to make such distinctions.
Instead of relying solely on the search algorithms employed by a search engine like Google, which may or may not return relevant results, microformats provide a decentralized approach to a search that treats the entire web as one massive, structured database. Individuals can mark up such things as their review of the place they ate at last night on their blog, and it will show up when someone searches for reviews of that restaurant. With appropriate markup, the restaurant's address can be added to an address book with a single click. This decentralization gives search engines the ability to leverage the collective intelligence of a community of web designers in a way that just wasn't possible before.
Markup Without Complexity
Because XHTML is built on XML, it is useful not just for displaying web pages, but also for general-purpose data exchange. Microformats are much simpler to implement than complex semantic technologies such as RDF and OWL, but there are some basic recommended guidelines and standard practices to be aware of:
- Microformats should be used to solve a specific problem, such as making contact information actionable.
- They should be as simple as possible and designed for humans first and machines second.
- As much as possible, you should reuse building blocks from existing and widely adopted web standards.
- Keep in mind the general principles of modularity, the ability to embed, and decentralized development.
For a better understanding of how microformats can be used to enable semantic markup for particular types of information, take a closer look at a few of the more useful established types.