eb developers awake! XHTML (Extensible HyperText Markup Language) is coming to a server near you. It’ll change everything you ever knew about Web design, give you untold power on the client and the server, and solve one of the great nagging problems of how to create a Web site without spending billions of dollars on versions for Internet Explorer, Mozilla, AOL, Palm Pilot, your telephone…well, you get the idea. On January 26th, the World Wide Web Consortium (W3C) released the first upgrade of the HTML 4.0 standards in more than a year. Surprisingly, this upgrade wasn’t intended to add a few more tags or incorporate a couple of CSS extensions into the language. Instead, the XHTML 1.0 standard (located at www.w3.org/TR/xhtml1) ceased being HTML (see the sidebar, “History of HTML”).
An XHTML document, in the main, doesn’t appear all that radically different from a “normal” HTML element (see Listing 1). The root of such a document is still an node, the document is divided into a and section, and the tag usage is consistent with what has been produced in HTML editors or by hand for the last decade.However, you will notice some differences. The first has to do with the fact that this is an XML document. It contains the processing instruction , which both tells the parser that it is an XML document and that it uses the standard 8-bit encoding schema of most typical English documents.
The document’s DOCTYPE declaration is likewise a little different from the norm; it points to the XHTML DTD rather than the HTML 4.0 DTD. One of the big controversies surrounding the XHTML specification had to do with a fight between two distinct factions in the W3C. One group wanted to define only one DTD for the specification, arguing that it would help keep the language simple. The other group felt that there should be three distinct DTDs for three different types of XHTML:
- Strict. The core HTML within the document followed clearly delineated constraints, and any non-HTML code added to it would need to be added under a separate namespace.
- Transitional.While the HTML contained in the document has to be XML conformant, the requirements about which elements can be contained where are much less strict?you don’t need a namespace to declare specific non-HTML-based tags. This is primarily a way to start moving other tag-based formatting standards, such as ColdFusion or ASP, into the domain of XML. As its name implies, it is generally considered a transitional state, and should be used principally for older HTML documents being converted into XHTML.
- Frameset. Frames are for the most part independent of the content that they contain. Because they’re essentially meta-structures, the W3C decided to pull frames out of the base XHTML format and create a distinct namespace for it.
The XHTML document recommended in January 2000 took this latter approach, with three distinct namespaces that you could potentially specify. In practice, unless you work heavily with frames, you will probably only need to worry about the strict DTD.
Namespaces have become fairly common in XML circles lately, but if you’re working in a strictly HTML environment, chances are you’ve not encountered them before. Namespaces serve a simple purpose?they identify a set of tags as belonging to one particular object description. It’s entirely possible that two XML structures might be used together (in XHTML, it’s almost certain) and you need to have some way of distinguishing between a
A namespace associates a specific prefix, a short name or even letter, with an associated URI (Uniform Resource Identifier), as a way of identifying the namespace uniquely. It is not required that the namespace actually point to anything (indeed, most of the common ones don’t)?only that you uniquely identify the namespace relative to other namespaces in the document. For example, this declaration identifies the default namespace (xmlns=”http://www.w3.org…”) for the document, which specifies that unprefixed tags will use the XHTML standard for display:
The default namespace is one where the tags don’t require a prefix to identify them. The declaration then defines a second namespace (xmlns:emp=”http://www.myCompany.com/ ..”‘), which indicates that any element that begins with the prefix “emp:” should be considered to be part of the employee namespace for your company. Thus, you may have an XML structure much like Listing 2, where an XHTML document contains an embedded XML island.
This ability to separate namespaces is an important aspect of XHTML, although to really appreciate its significance, it is worth shifting your viewpoint about HTML from that of a markup language to one where HTML provides the definition of a document object that is in turn made up of paragraph objects, list objects, header objects, form objects, and so forth. The XHTML namespace describes a collection of document objects. A different namespace describes a different object model?a different view of reality that’s focused on objects such as employees and addresses. When you combine two such namespaces together, you define relationships between the two object collections?for example, this section of the document focuses on employees, that HTML table is linked to this other site of financial information, and so forth. This has benefits for both creating sophisticated server-side code for displaying such information, as well as for creating modular output that contains subsets of HTML for different platforms.
Modularization plays a big part in XHTML, and will play an even bigger part as the specification evolves. One of the chief problems that affects traditional HTML is that it is fundamentally monolithic?as a browser manufacturer, you either implement all of it, or you are non-conformant (which almost all browsers are, with a few mostly academic exceptions). The primary consequence is that, with the advent of Internet-aware PDAs (personal digital assistants) and dedicated WebTV-like devices, you are seeing any number of devices that simply don’t have the bandwidth to support the full specification, and so miss critical pieces of it.
Recognizing that this consequence is unavoidable, the XHTML recommendation is moving into a modular approach for the specification. Rather than defining a single standard, the XHTML specification defines a core set of “basic” tags that should be considered the minimal level of support (primarily for PDAs and hand-held Internet devices), then adds modules which can be used to expand upon this core set. The principle set of modules that defined XHTML 1.0 is summarized in Table 1.
One thing that may become more evident after a few minutes studying the modules is that, for the most part, they don’t make any major changes to the HTML 4.0 specification. This was deliberate?XHTML 1.0 is a means to convert HTML 4.0 (actually HTML 4.1, but the differences there are subtle) into an XML specification.
However, the modularization that forms the basis for XHTML 1.1 was done because the W3C realized that HTML 4.0 in and of itself isn’t sufficient to handle expanding the language. A browser manufacturer could create a proprietary extension module, for example, that would enable specialized support for that browser. For example, mobile phone companies may want to include an extension to the XHTML specification that would make voice-specific elements available?elements (or attributes) for specifying tonal qualities in synthetic speech agents, language attributes for handling dialectic differences between speakers, and so forth. This extension would be incorporated into a namespace that could generally be filtered out by non-audio clients?they simply wouldn’t recognize the namespace extension for voice interactions, or would be stripped by XSL scripts in servers depending upon client. Similarly, such servers could work in the other direction, encoding XHTML code with VoxML (Voice Markup Language, a voice transcription and recognition format) or similar extensions when “talking” to a voice-enabled client.
So when can you run an XHTML document? Well, with a few minor constraints, right now. Most browsers that are currently in use are non-validating; they don’t check to see that HTML is completely valid or not, and for the most part will let wildly non-compliant HTML pass through unhindered because the rendering engine (the part of the browser that interprets and displays the HTML) is given some extremely wide latitude in handling output.
Ironically, this leniency shouldn’t be true of XHTML. XHTML works upon the assumption that the code is pure XML, and an XML parser should complain if the XHTML being passed in isn’t completely valid. Fortunately, the laws for turning “normal” HTML into XHTML are quite simple.
First, all elements are containers, and must be closed. Any time you create a tag (such as
, for paragraph), you must make sure to have a closing tag
that closes the current tag. If a tag contains no text or inner elements, it can be terminated with a />. For example, in HTML, the image tag is expressed as , while in XML, the same tag should either be closed explicitly: or terminated within the bracket itself: . In addition, all attributes must be enclosed within either single or double quote marks. A practice that, unfortunately, is pretty much carried on by most HTML editors is to not place attributes within quotes (especially ID elements). This means that even if you follow the rules about elements, almost all HTML editors will not produce valid XHTML.
Attributes must always have expressions. There are a few attributes in HTML, such as the selected tag within
One element cannot overlap another element without containing it completely. The expression a test is valid, but a test is not because the italics tag overlaps the bold boundaries. Just remember that an XHTML element is a container, and this error will usually not happen.
To preserve space within an XHTML element, enclose it in a CDATA section. CDATA sections are XML constructs that tell the XML parser to not parse anything within the expression. CDATA sections are delimited by the starting character sequence . This is especially useful with scripts, where the < and > signs may be used for “less than” and “greater than” respectively, not as the start of a tag. In general, if you are conscientious about closing or terminating tags and enclosing attribute values, you’ll head most XML errors off at the pass.
While the rules for using XHTML are simple, in some (perhaps most) cases, this simple change can play havoc on archived HTML material. What benefits does anyone derive by using XHTML over traditional HTML? To really appreciate the benefits of XHTML, it’s worth understanding a little bit about the true value of XML in the first place. XML isn’t HTML with custom tags?even though that was one of the primary rationales for creating XML in the first place. XML is a language for representing complex relationships between objects (hearkening to the object model discussed previously). Moreover, one of the principle technologies at work with XML is the Extensible Stylesheet Language (XSL).
If you are familiar with Cascading Style Sheets, you may think that an XSL style sheet is simply a different way of expressing styles. It isn’t. XSL is a technology written in XML that can associate given element patterns in an XML document with other collections of strings (or better yet, with elements from a different XML form). XSL can take an XML document as input and convert it into HTML, for example. However, if that HTML document isn’t also an XML document, then most XSL parsers have to manipulate the HTML as strings, which is much less efficient than manipulating the element as internal binary objects.
Moreover, XSL can’t transform HTML into XML unless the HTML is also well-formed XML. On the other hand, with XHTML you can perform a direct transformation into another XML structure (or even into different XHTML), pass an XHTML document through but change only a few selected elements, or retrieve information that may be contained in an HTML table and convert it into a different XML structure.
For example, you could create a relatively simple XSL transform that would read through an XHTML document and convert any expressions of the form
Similarly, you could store XHTML blocks within an XML document or server that could then be easily retrieved through the use of another XML technology?XPath. XPath gives you a way to perform fairly complex queries on the nodes and extract data at any level of complexity. Most traditional index servers use a brute force method to index a site?recording the positions of given words in their respective files. The results of such searches are thus only as current as the last time the site was indexed, and moreover prove problematic when dealing with dynamically generated data.
With XHTML, on the other hand, the information can be retrieved much more topically?if you know the best structure of the documents at hand, you could specifically retrieve only those sections within documents that pertain to the current record. For example, you could retrieve only the tables in a document, or only Table 3 in the document, or only tables that contained Northwest sales amounts for Fall 1999 in excess of $20,000,000 dollars. This level of specificity compares favorably to that of SQL, without any of the headaches of dealing with JOINs and trying to reconstruct hierarchical data.
Yet the biggest benefit to XHTML is that it makes it possible to target your data to any device whatsoever. Consider a news site such as CNN.COM. The site could conceivably (though they don’t now) produce basic XML information consisting of news stories with specific key words denoted to add context to the information, perhaps coupled with multimedia such as audio files, transcriptions, video streams with SMIL-based timing to handle interactive charts rendered in SVG (Structured Vector Graphics), and dynamic links. The top stories are dispensed into an XML-formatted document that is used to retrieve salient information for teasers, coupled with a second XML document consisting of advertising media links that are keyed upon specific elements in the stories themselves (a fashion show might feature clothing and makeup advertisements, a football story would show beer, a terrorist incident with life insurance information, and so forth).
When you connect to CNN.COM, the server queries your browser and determines which modules of XHTML your client supports. Your Internet Explorer or Mozilla browser running on a high-end machine on a T3 might receive the full treatment?the aggregate XML gets filtered through XSL to produce full multimedia streams which your client can then filter and display based upon its own built-in XSL transforms. The Palm Pilot gets XHTML Lite, given the relevant stories but with limited graphics (although perhaps with keys in the XHTML so that the parser can retrieve specific information from the stories itself for synopsis and later retrieval). The cell phone would get headlines, text, and basic links, but could be switched into audio mode (and output through your car’s stereo system) so that the stories could be read by voice software, and could in turn send signals back based upon vocal commands (encoded in VoxML) to the server to change the story and retrieve the latest highway report.
This isn’t a fantasy. All of the technologies described here are currently doable. Moreover, while it is certainly possible to build formatting software for either producing or extracting information from regular text streams (such as through ASP or JSP), such software has to be custom written for every format change, typically at incredible expense. With XHTML (and XML in general), you can specify the transforms that you want, and use them in highly modular fashions, designing only those pieces that affect a small piece of the stream.
You want to change the look of the site? Change the XSL filter. You want to target the latest holographic browser (okay, maybe there is a little fantasy here) with your server? Pull in the browser’s XHTML extension from the manufacturer’s Web site (if it’s not already cached), use XSL to aggregate the profile elements into a schema, then use another XSL transform to output the results into 3DML (3-Dimensional Markup Language) with a side stream in XHTML for the attached 2D browser.
If you wanted to purchase the cool computer shown in the holographic viewer, your forms browser would in turn send back an envelope of form data to the CNN XML server, and convert it into an XML-based purchase order supported by the vendor of the product. The vendor, in turn, would send a payment request to your bank (which, in turn, sends more XHTML back to you to authenticate the purchase).
Put another way, the advantage to XHTML is that it becomes a part of the XML pipeline?a fairly transparent part that does not need to be hand coded. Certainly it can be?much of the Web is still made up of sites that are hand coded because they are expressions of art rather than commerce?but XHTML will likely end up changing the way that most Web sites handle almost all of their output, and can free up people from the relatively mundane tasks of formatting content and move them into the more challenging roles of creating the content in the first place.