he newest addition to the pantheon of web technologies is yet another markup language. XML, otherwise known as the Extensible Markup Language, resembles other markup languages like SGML and HTML. But XML empowers developers to create their own tags and markup languages to suit their needs.
After using XML, you’ll never look at web document design in the same way. Mangling HTML beyond all recognition to achieve a specific web page design goal will soon be a thing of the past. No longer will all documents be forced to fit into the HTML mold. Accompanying the advent of XML is a new way of thinking about documents as well as a new design paradigm. The current “whatever it takes to get the right look” approach to document design will soon give way to a focus on accurate descriptions of document content using the best markup.
Prior to the advent of XML, physicists, trainers, and developers in every other field had to manipulate their data to fit into the HTML document model, even when the fit was far from perfect. Because XML is a markup language used to describe other markup languages (hence the label “meta-language”), you can create your own Document Type Definitions (DTD) to define a set of markup tags specifically tailored to fit your needs. Physicists might create a PML (Physics Markup Language) to describe formulas and other physics-related content. A web-based training company could devise a WBTML (Web-Based Training Markup Language) to catalog its many training offerings or to describe the contents of those offerings.
XML provides a standard set of tools for developers to use when creating markup. These XML-based markup languages are called XML vocabularies or applications. Each is a unique markup solution that meets a specific need, or speaks to a specialized audience (see the sidebar “Is XML Right for You?”). Although XML just settled at version 1.0 in February 1998, a wide variety of XML vocabularies are already in use, and still more are in development. The list of available applications will continue to grow as XML becomes more widely accepted as a web technology, and as more browsers support XML-based markup directly.
Although each XML vocabulary is unique and varies widely from the others in scope and intent, all have two important things in common. First, each is written using XML, which makes them members of the same extended markup family, built according to the same standard, and readable by any XML-compliant browser. Second, each represents a markup language designed to describe a specific type of content.
XML was created to enable developers to create their own unique vocabularies that would function predictably within a standardized set of structures and rules. Each application that follows the rules of XML can also be processed using standard XML tools. When XML is fully integrated into web browsers, content described using any XML vocabulary will be readable by all XML-compliant browsers.
Markup Describes Content
Web page developers have been working with a markup language for some time now. Although HTML stands for HyperText Markup Language, it is instead treated as a HyperText Formatting Language. The trend in web page design has been to describe documents using HTML, while keeping a close eye on what the documents look like when displayed inside a web browser. An overwhelming concern for the final display of a document by a user agent (AKA browser) is in violation of the most basic tenets of markup.
The primary goal of markup is to separate the description of a document from its final display. Markup should be used to describe the different parts of a document’s content using tags as labels. Documents that use markup are ASCII (text-based), so they’re platform and operating system independent. A document designer concentrates on using tags to describe a document’s contents as accurately as possible, without regard for its onscreen appearance. In theory, developers don’t even have to know how a document is to be rendered—be it on screen, in print, or through a projector—to describe that document correctly.
In fact, documents that are independent of platform, operating system, and application can be created by separating markup from display. Specialized software applications—which are perforce platform and OS specific-process such marked-up documents and, based on the descriptions of each part of a document, render its final display as needed. Each XML vocabulary is completely defined by its own unique Document Type Description (DTD), and you must play by a DTD’s rules to create documents for its vocabulary.
Every XML document has two parts: the DTD and the document element. The DTD lists the tags and their associated attributes that may be used in the document. The DTD also specifies which tags may be nested within other tags, which tags and attributes are required, and which tags are optional, as well as what entities, such as graphics and non-ASCII characters, may be used in documents created using the vocabulary. The document element is a single markup tag that contains the document’s entire content and other markup. Document structures are declared in the DTD and are used in the document element to describe content.
To create XML documents, you need to be familiar with several basic constructs: elements, attributes, and entities. XML also allows comments, so you can document your code. Below are the basic structures and syntax you’ll need.
Elements are the labels used to describe your content. They’re described in the DTD by element declarations and invoked in the document element as tags. Element declarations by default define tag pairs, like the heading Level 1 tag pair (
…) used in HTML. Tag pairs contain text as well as other elements and their content. An element declaration may also define an empty element, one that isn’t designed to contain any text or other elements, such as the image tag () in HTML.
By way of example, imagine that you’re creating a simple document to describe the various software packages your organization owns. The purpose of this document is to catalog and keep track of the software you have so you can upgrade to newer versions or don’t accidentally buy duplicate copies of a package. A simple DTD would need elements for the software title, version, vendor, platform and operating system requirements, a brief description of the package, and the number of copies you own. Listing 1 shows the element declarations in the document’s DTD.
These element declarations accomplish several things at once. They describe the eight tags that can be used within the document element to describe a software package. The individual element declarations also specify what content can be included within the element. The declaration for the package element indicates that one instance of the title, version, vendor, platform, OS, description, and copies tags must be nested in that order within the package element. Both the platform and OS entities are specified as empty tags, and all of the other tags may only contain regular text, as defined by the (#PCDATA) statement after the tag name in each declaration. Here’s the markup created from these element declarations and used to describe a software package:
Norton Utilities 3.5 Symantec A hard disk utility program 1
The package element contains all of the other tags, as specified in its element declaration. Also, the empty platform and OS elements have a slash before the greater-than sign that closes the tag. This is XML’s syntax for specifying empty elements. Since these two elements are empty, there has to be another way to provide information about the platform and operating system the package supports. That other way is with attributes.
Attributes provide extra information about an element. Specific attributes are defined for individual elements on a case-by-case basis. XML attributes work just like HTML attributes, so this will be familiar territory for web builders. Attributes are defined in a DTD by an attribute list declaration. In the software description document attributes provide operating system and package information. The attribute list for these two elements might look like this:
The attribute list declaration for the platform element indicates that the TYPE attribute can have a value of either PPC or Mac, that the default value is PPC, and that the attribute is required. The attribute list declaration for the OS element indicates that the TYPE attribute can have a value of Mac7x, Mac8, WIN95, WINNT, or WIN98, the default is WIN95, and the attribute is required. You add the two attributes to their elements like this:
Notice that both attribute values are in quotation marks. One of the rules of XML is that all attribute values, regardless of type, must be enclosed in quotation marks. XML provides for other types of attribute values, including text strings and unique identifiers, and not every attribute must be labeled as required.
An entity is a storage unit that can hold string or blocks of text (a text entity) or non-XML data like graphics, audio files, and video files (binary entities). All entities used in a document must first be defined with an entity declaration that assigns a name to the entity that is then used to reference the entity in a document. Entities are actually one of the most powerful DTD and content management structures available in XML, but they’re also a more advanced topic than I have room to address in this article. Any of the XML resources listed will have complete information on creating and implementing XML entities.
XML employs the same comment syntax as HTML. Any text or markup located between and -> is invisible to the application processing the document but is visible to any person working on the document. Use comments to leave notes to yourself or others, or to temporarily disable sections of markup and content, as you would in HTML.
Putting it All Together
So, what does our software description document look like when it’s all put together? Listing 2 shows the final results. You’ll notice that an XML declaration () begins the document. This specifies what version of XML the document was written for. The DTD portion of the document begins with /span> software. This declaration indicates that the DTD is about to begin and that the document element for the document is software. The non-DTD portion of the document always begins with the open document element tag and ends with the close document element tag. The document is set up to contain as many other package descriptions as necessary as long as they all fall between the
The DTD portion of the document doesn’t always have to be stored inside the document. Instead the DTD can be saved in its own file for reference by several different documents. If I removed the DTD from the document and saved it in a file called software.dtd, the new internal DTD reference for this document would use this syntax:
avaScript:showSupportItem('sidebar2');">“Selecting the Right XML Vocabulary”). All XML documents are a variation on this theme. Different vocabularies provide the DTD, but the structures are defined in the same manner and invoked in the document element with tags. Once you learn how to read DTD and write quality markup the rest is just knowing what tag does what, and that’s nothing new in the web world.
XML Design vs. HTML Design
To design flawless XML documents, web page developers will have to leave many of their old HTML habits behind and learn to use markup languages as intended. Even veteran page designers might be shocked to know that they’ve been using HTML incorrectly all this time. That is, they’ve been treating it as a formatting mechanism instead of a description mechanism.
The forgiving nature of web browsers and the focus of books, magazines, and web sites on tips and tricks for controlling the final display of web pages has helped promote HTML as a formatting language. Although mangled HTML markup might lead to tightly controlled, well-designed web pages, mangled XML leads only to heartache.
Start your journey into the XML design world by leaving this HTML baggage behind:
- Don’t do “whatever it takes” to achieve the desired look and feel. This approach to page design is a direct violation of the spirit of markup. Remember, you’re describing content, not formatting it.
- Stop designing for one browser or another. The whole idea behind markup is that documents are created without regard for their final appearance. As long as your documents are well described, they will be rendered correctly.
- Don’t try to force “round” content pegs into “square” markup holes. You’re not limited to plain HTML anymore, don’t feel you have to force your content to fit any specific notation. Choose the markup that best describes your content and your information processing needs.
- Don’t ignore the rules because the browsers do. By its very nature, XML requires you to follow its rules, and you’ll find that if you follow the HTML rules more closely, you’ll achieve better overall results in the end.
- Stop focusing on your document’s final look and feel as displayed on a graphical screen. Although the majority of HTML documents are viewed with graphical browsers, they are not limited to GUI interpretation. XML is designed to allow documents to be rendered in a wide variety of ways from computer screens, to text-to-speech readers, to projection systems, to printers.
Of course, designing the document is only half the issue. What about display and implementation issues? A document won’t do you any good if you can’t share it with others. That’s where the realities of implementation set in.
Implementing XML Today
XML is an emerging technology and the majority of parsers and browsers written to process XML documents are experimental at best. Currently, you can’t transfer an XML document to your web server, point at it with a URL or hyperlink, and expect the average browser to know what to do with it. For an XML document based on any vocabulary to be rendered by a user agent it must first be parsed, and today’s web browsers can only parse and display HTML pages. All other markup is mostly foreign to them.
The creators of the various XML DTD already under development realize that web browsers don’t support their documents. To remedy this situation, many of them have developed specialized parsers and browsers, geared specifically to their XML vocabularies, written in Java for embedding in web pages. When you choose an XML vocabulary, you’ll want to find out what parsers and tools have been developed to help you display documents written to that vocabulary. For example, the creators of the Chemical Markup Language (CML) have developed a CML browser called Jumbo, as well as a series of applets to parse and view CML documents. Similar activities are underway for many other XML vocabularies that have already been developed.
When Netscape released the Mozilla source code for what would have been Navigator 5.0, it included XML support for the Resource Description Framework (RDF) vocabulary. The source code also included a version of James Clark’s expat XML parser. Although Netscape’s inclusion of XML support in its browser isn’t quite as extensive as Microsoft’s, it is a good start. Based on the reactions to XML from the two top browser vendors, it’s obvious that XML is a legitimate web phenomenon. Look for increased XML support in future versions of both browsers, and from other browser vendors as well (see the sidebar “Real-Life XML: How Microsoft Channels Work”).
You might be concerned that XML will inhibit your creativity or limit your page design options. Never fear, XML is going to revolutionize page design by adding whole new groups of markup to your design arsenal. You’ll have the right tools available for the job, and that’s far better than using the wrong tool because it’s the only one you’ve got. Have you ever tried to pound a nail into a wall with a screwdriver because you didn’t have a hammer? XML gives you a hammer, a power saw, a cordless drill, and every other power tool you’d possibly need to design amazingly creative and content-oriented web pages. Don’t fear XML, try it!