Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


XHTML: HTML Merges With XML  : Page 2

The W3C's recently approved XHTML standard combines HTML and XML and makes it possible for your Web pages to be viewed on a wider variety of devices.


Modularization plays a big part in XHTML, and will play an even bigger part as the specification evolves. One of the chief problems that affects traditional HTML is that it is fundamentally monolithic—as a browser manufacturer, you either implement all of it, or you are non-conformant (which almost all browsers are, with a few mostly academic exceptions). The primary consequence is that, with the advent of Internet-aware PDAs (personal digital assistants) and dedicated WebTV-like devices, you are seeing any number of devices that simply don't have the bandwidth to support the full specification, and so miss critical pieces of it.

Recognizing that this consequence is unavoidable, the XHTML recommendation is moving into a modular approach for the specification. Rather than defining a single standard, the XHTML specification defines a core set of "basic" tags that should be considered the minimal level of support (primarily for PDAs and hand-held Internet devices), then adds modules which can be used to expand upon this core set. The principle set of modules that defined XHTML 1.0 is summarized in Table 1.

One thing that may become more evident after a few minutes studying the modules is that, for the most part, they don't make any major changes to the HTML 4.0 specification. This was deliberate—XHTML 1.0 is a means to convert HTML 4.0 (actually HTML 4.1, but the differences there are subtle) into an XML specification.

However, the modularization that forms the basis for XHTML 1.1 was done because the W3C realized that HTML 4.0 in and of itself isn't sufficient to handle expanding the language. A browser manufacturer could create a proprietary extension module, for example, that would enable specialized support for that browser. For example, mobile phone companies may want to include an extension to the XHTML specification that would make voice-specific elements available—elements (or attributes) for specifying tonal qualities in synthetic speech agents, language attributes for handling dialectic differences between speakers, and so forth. This extension would be incorporated into a namespace that could generally be filtered out by non-audio clients—they simply wouldn't recognize the namespace extension for voice interactions, or would be stripped by XSL scripts in servers depending upon client. Similarly, such servers could work in the other direction, encoding XHTML code with VoxML (Voice Markup Language, a voice transcription and recognition format) or similar extensions when "talking" to a voice-enabled client.

So when can you run an XHTML document? Well, with a few minor constraints, right now. Most browsers that are currently in use are non-validating; they don't check to see that HTML is completely valid or not, and for the most part will let wildly non-compliant HTML pass through unhindered because the rendering engine (the part of the browser that interprets and displays the HTML) is given some extremely wide latitude in handling output.

Ironically, this leniency shouldn't be true of XHTML. XHTML works upon the assumption that the code is pure XML, and an XML parser should complain if the XHTML being passed in isn't completely valid. Fortunately, the laws for turning "normal" HTML into XHTML are quite simple.

First, all elements are containers, and must be closed. Any time you create a tag (such as <p>, for paragraph), you must make sure to have a closing tag </p> that closes the current tag. If a tag contains no text or inner elements, it can be terminated with a />. For example, in HTML, the image tag is expressed as <img src="myURL">, while in XML, the same tag should either be closed explicitly: <img src="myURL"></img> or terminated within the bracket itself: <img src="myURL"/>. In addition, all attributes must be enclosed within either single or double quote marks. A practice that, unfortunately, is pretty much carried on by most HTML editors is to not place attributes within quotes (especially ID elements). This means that even if you follow the rules about elements, almost all HTML editors will not produce valid XHTML.

Attributes must always have expressions. There are a few attributes in HTML, such as the selected tag within <OPTION> tags, that don't have corresponding attribute values. These attributes are considered invalid, and should be replaced with expressions (such as select="") when they occur. Note that some earlier Netscape browsers (versions 2.0-3.0) may have trouble with this form.

One element cannot overlap another element without containing it completely. The expression <b><i>a test</i></b> is valid, but <b><i>a test</b></i> is not because the italics tag overlaps the bold boundaries. Just remember that an XHTML element is a container, and this error will usually not happen.

To preserve space within an XHTML element, enclose it in a CDATA section. CDATA sections are XML constructs that tell the XML parser to not parse anything within the expression. CDATA sections are delimited by the starting character sequence <![CDATA[and terminate by the string ]]>. This is especially useful with scripts, where the < and > signs may be used for "less than" and "greater than" respectively, not as the start of a tag. In general, if you are conscientious about closing or terminating tags and enclosing attribute values, you'll head most XML errors off at the pass.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date