f you’ve been reading any of the headlines you’d think that XML is the greatest invention since auto-start coffee makers. It’s being touted as the wave of the future and the greatest, newest invention.
But with auto-start coffee makers, you wake up and pour your coffee. The benefits are clear and obvious.
With XML, the immediate benefits aren’t quite so clear. By itself XML doesn’t do anything except idenfity pieces of your document—but what makes it exciting is how having “smart” documents can help you work smarter too.
In this QuickStart you’ll learn what XML is all about, and what you can do with it.
What Is XML?
XML is a way of adding intelligence to your documents. It lets you identify each element using meaningful tags and it lets you add information (“metatdata”) about each element.
XML is very much a part of the future of Web, and part of the future for all electronic information.
Here’s the formal definition from the World Wide Web Consortium (W3C):
“The extensible markup language (XML) is a subset of SGML…. It’s goal is to enable generic SGML to be served, received and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.”
In other words:
XML—eXtensible Markup Language—is a way to put structure and metadata into your Web page, and make your information smarter and more powerful.
XML is a syntax for marking up data and it works with many other technologies to display and process information. It looks and feels very much like HTML.
XML isn’t going to replace everything else you’ve already learned; it complements it and extends it.
What’s the Fuss About?
XML lets you make documents smarter, more portable, and more powerful—that’s the promise of XML and that’s what all the fuss is about.
XML allows you to use your own tags to define parts of a document. You can do this because XML is a descriptive, not a procedural, language. That is, XML describes what something is rather than performing an action.
For example, take a look at the front page of a newspaper. You’ll see different font sizes, different sections, and columns.
If you were to create a Web page for that newspaper—using the same formatting and styles—:you would use tags such as and to define the size and color of a large headline, or to italicize a word such as a byline, in order to distinguish it from the rest of the text.
But just try to write tags that actually explain that you’ve got a Headline and that the words “John Smith” make up a byline. HTML won’t know what you’re talking about if you create tags such as
XML, with help from other technologies such as CSS, understands what the elements are and how to display them.
That means, in the future, when you’re searching on the Web for say, a Barbie doll for your niece’s birthday, you’ll get Barbie the DOLL instead of some other type of Barbie, because the Barbie doll page might be marked up like this:
Pretty cool, huh?
XML documents can be moved to any format on any platform—without the elements losing their meaning. That means you can publish the same information to a Web browser, a PDA, or a network-enabled bread machine and each device would use the information appropriately.
The most important thing to remember about XML, though, it that is doesn’t stand alone. It needs other technologoies, like CSS, in for you to see its results.
If all of this seems like a pain, and you don’t want to mess with XML, it’s OK. You don’t need it to make a great Web page. But you never know when organization will come in handy.
Where Did XML Come From?
XML is a simplified version of SGML and a cousin of HTML. It was developed by members of the W3C and released as a recommendation by the W3C in February 1998.
SGML, the parent of XML, is an international standard that has been in use as a markup language primarily for technical documentation and government applications since the early 1980s. It was developed to standardize the production process for large document sets. Think: Medical records. Company databases. Aircraft parts catalogs. Other really huge documents.
Marking-up documents in SGML allows information to be passed from one system to the next without losing information. With databases marked-up in SGML you can see what Widget A is all about and go check to see if Widget A is in stock.
Early on, people thought that SGML would be useful for the Web. In fact, HTML is really an very basic application of SGML! But HTML quickly became used for visual layout, so a group of people returned to the basics, determined to create something that had the strengths of SGML without being so difficult to implement—and had the ease of use of HTML, but with more structural power. The result was XML.
The design goals of XML, taken from the XML Specificationare:
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs which process XML documents.
- The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
- XML documents should be human-legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance.
In other words, XML is easy to create, easy to read, and designed for use over the Internet. What more could a Web designer ask for?
What Does XML Look Like?
If you’ve ever used HTML, XML is going to look very familiar!
When you view the source of a document written in XML the first thing you’ll see is the XML declaration, which looks like this:
Then, in the body of the document, you’ll see a lot of tags. The tags look familiar at first—they start with the usual less than sign and end with the usual greater than sign, like this:
But then you’ll notice that the tags might not be quite the names you’ve come to expect! You’ll see tags that seem to be made-up tag names. Tags like
Suppose you’re looking at a Web page marked up in XML on The Canterbury Tales by Chaucer. You’re looking specifically at lines 282-286 of “The Physician’s Tale.” The document source for that section might look like this:
The Physician's Tale
That no man woot therof but God and he. For be he lewed man, or ellis lered, He noot how soone that he shal been afered. Therfore I rede yow this conseil take — Forsaketh synne, er synne yow forsake.
The tags simply define that:
1) This document is the Canterbury Tales.
2) This section is the Physician’s Tale.
3) Each line of the Physician’s Tale is defined.
4) Each line ends, and the Physician’s Tale and The Canterbury Tales end.
If the entire document were marked up such as this, you could easily jump to a certain line or section. The entire document is annotated for easy reference and searching, and instead of viewing the entire document, users could request only specific sections of a document—simply by calling the specific tags they want. Oh, and we don’t recommend that you manually type out each line in the Canterbury Tales. Get a computer to count the lines for you.
XML Versus HTML
HTML and XML are cousins. They draw off the same inspiration, SGML. They both identify elements in your page. They both use a very similar syntax. If you are familiar with HTML, XML will also feel familiar.
The big difference between HTML and XML is that HMTL has evolved into a markup language that describes the look, feel and action of a Web page. An is a headline that is displayed in a certain size, for example.
In contrast, XML doesn’t describe how a page looks, how it acts or what it does. XML describes what the words in a document are. This is a critical distinction! While HTML combines structure and display, XML separates them. This means that XML documents are more portable and can be used in many different types of applications.
In the near future, we’ll see both XML and HTML documents. Eventually, XML will probably replace HTML, or HTML will become an application of XML. But that doesn’t mean you should toss out everything you know! In many ways, XML builds on HTML and if you know HTML, XML will be easier to work with.
Valid and Well-Formed XML
You’ll sometimes hear an XML document referred to as a “valid” XML document or a “well-formed” XML document. This distinction touches on one of the nice things about XML.
When you used SGML, you had to create something call a Document Type Definition (DTD, for short) in order make the SGML document useful. DTDs were fairly complex and required a lot of work to create. They were one of the roadblocks to widespread use of SGML.
With XML you have an option. You can make a well-formed XML document by simply following the XML syntax rules. You don’t have to create a separate DTD if you don’t want to.
If you do create an set of rules—a DTD—and make your document conform to those rules, it is considered a valid XML document.
DTDs describe the structure of your document. We’ll be discussing DTDs in detail later on. Right now, all you need to know is that the main difference between valid and well-formed XML is that valid XML refers to and conforms to a DTD and well-formed XML doesn’t.
Now you have a basic understanding of what XML is all about. If you’re reading this, you’ve probably decided that XML is something you’d like to explore further. Where might you go to learn more about XML?
For starters, we recommend the rest of the Developer Zone. This section walks you through many more details of incorporating XML into your site. In the Reference Section is a full set of additional links and resources
Another thing you can do is look at some examples of how people are using XML. Here are two good examples from InsideDHTML.com (note: These examples required IE 4X or 5X and later browsers):
Weather, from InsideDHTML is a straight forward example of XML for a custom weather report.
Poetry, also from InsideDHTML.com, allows you to write your own poem, directly on a Web page, and see it display with XML.
Tim Bray’s annotated spec is a great way of going directly to the source, so to speak. (Tim was one of the authors of the XML specification.)
Another possiblity is to visit DevX Discussions, to participate in the various discussions, where you can compare notes with others who are working with XML and the Web.
Thanks for visiting the Tips! We hope you understand a bit more about XML now than when you first stopped by.