ML is a fundamentally simple ideatake bits of content and give them identifying tagsbut it has far-reaching effects. In just a few short years, XML's evolution has sparked an explosion of innovation that's touched nearly every facet of computing, even the most basic computing building blocks, such as file systems, databases, displays, and communications. And it's not done yet. It won't be long before XML permeates nearly every system, application, and data store within reach. Think I'm exaggerating? Look at what XML has already accomplished.
At first, most developers equated XML with Web applications, because it looked like HTML. Some disparaged it as nothing but a bloated delimited text file. It's true; XML is a bloated delimited text file, but with that bloat come five capabilities that more than justify the bloat and differentiate XML from delimited or fixed-width text files:
- XML is a delimited text file with standard and universal construction rules. Documents that adhere to these rules are said to be "well-formed." The rules are extremely simple, which makes XML easy to parse. Any XML parser can parse any well-formed XML file.
- XML is hierarchical, and has the ability to carry not only regular, symmetrical data, such as values from database tables, but also irregular data, such as articles, books, and program objects.
- XML contains not only data, but also carries some information about that data, via the tags and hierarchical structure used in the file. Each tag is associated with a unique namespace, which ensures that common tags don't get confused. For example, my <name> tag in my namespace isn't the same as your <name> tag in your namespace, even if we merge the documents.
- XML schema (yes, I'm ignoring DTDs) extend the minimal meaning carried by the XML markup tags and namespaces. Schemas specify the allowable tag names, structure, and valid content of XML documents.
- Taken together, schema, namespaces, and tagged content give applications the ability to validate a document against a schema. Because that capability is built into validating XML parsers, developers can pass data between methods, applications, and organizations without having to write complex, error-prone validation code at each endpoint. Moreover, schema let applications automatically convert the text representation of a value stored in an XML file into a more useful typed data value, such as a date or object.
In contrast, plain text files don't conform to either a standard or to universal rules, aren't well-suited to hierarchical or irregular data, carry little or no information about the data they contain, and aren't usually accompanied by schema, resulting in a wide variety of application-specific formats and forcing developers to write custom code to parse and validate the contents.
These five capabilities led to widespread changes for developers.
XML Supplants HTML
First, HTML, which had been undergoing rapid evolution, became obsolete, mutating into a fixed subset of XML called XHTML. That process is still under way, partially because HTML tools vendors didn't support XHTML as fast as they should have, and partially because Web developers didn't see immediate advantages in XHTML, so they continued (and some still continue) to write HTML instead. If there's one bright spot in the Eolas patent lawsuit decision, it's that companies will have to alter large numbers of these obsolete HTML pages and thus may finally get the message that they should start writing Web pages that are XHTML compliant.
XML Captures Configuration Files
Look at almost any modern application, and you'll find XML configuration files that control its behavior. Applications often let administrators and users control some aspects of their behavior through external files, called configuration files. For example, an application may need a database connection string that controls where it should store data. A more complex application may need to assign different connection strings based on a user's role within the application. Simple applications used simple text or INI (initialization) files, while more complex applications used proprietary text-based formats or binary files. INI files had size limitations and were unsuitable for storing deep hierarchical data. Proprietary files often required a custom interface; modifying them to accommodate changing application needs was difficult, and binary files weren't human-readable. Modern applications use XML because it solves these problems: it consists of human-readable and modifiable text, supports deep hierarchical data, has a regular and verfiable structure, and accommodates structural changes easily.
XML Underlies Web Services
|Using XML, the same file can describe a UI implemented on any platformyet one more example of the old adage that you can accomplish almost anything in programming by adding a layer of indirection.|
Having conquered HTML, XML's next victim was DCOM and CORBA. Because you can represent both objects and data in XML, other, proprietary standards are no longer needed except where performance is paramount. While it's faster to pass tightly-packed binary data over a network, it's also far less efficient from both a time-to-market and a cross-platform point of view. Web services and SOAP provide a standardized way of representing data and objects between two endpoints. They're easier to create, easier to debug, and work seamlessly across all platforms. More than that, they're automatable and work over HTTP, which makes it much easier for organizations to expose functionality to outside consumers. Amazon's and eBay's success with commercial Web services attests to the viability of Web services on a global scale.
XML Marks Up GUIs
HTML defines user interfaces, but it's a closed language, fixed by the W3C to a limited set of tags. That fixed nature acts as an unbreakable wall, any concept not already in HTML is not describable in HTML. Fortunately, unlike HTML, XML is not limited to specific tags; therefore, you can use it to describe any user interface. Products such as Mozilla (XUL) have already taken advantage of this, but XML-based UI descriptions will gain even more widespread penetration with the XAML UI description language built into Longhorn. Using XML, the same file can describe a UI implemented on any platformyet one more example of the old adage that you can accomplish almost anything in programming by adding a layer of indirection.
XML Dominates Future File Systems
Since the early days of computing, file systems have consisted of hierarchical lists, composed of directories and files. Any directory can contain both files and other directoriesa concept equivalent to "mixed-content" elements in XML, which can contain both data and other elements. But file systems work on a fixed attribute model, where each type of content has specific fixed attributes such as size, name, creation date, last modified date, last access date, etc. That fixed-attribute model is fast, but it greatly limits the meta-information that can accompany a file, and thus limits the way files can be organized. XML eliminates these restrictions by treating files as attachments to a customizable XML document.
It's often extremely convenient to organize a file into several categories, or to provide comments that accompany the file, but aren't integrated into the file itself. For example, assume you have a file that contains an article about XML and XSLT. With a simple hierarchical file system, you have to decide where to place the file, perhaps somewhere such as c:\My Documents\XML Articles\XSLT. Worse, you have to remember where you've placed it to find it again"Was that file in XML Articles or was it under Transformation Languages? Hmm"or perform a full-text search to retrieve its location. Using an XML-based system, you can categorize the file in any number of categories, such as markup languages, transformation languages, XML, XSLT, Articles, etc., and then retrieve it using any of the associations. Further, an XML-based system has the potential to provide instant annotation capabilities for any file, not just those whose associated applications include annotation capabilities.