ML is a fundamentally simple idea?take bits of content and give them identifying tags?but it has far-reaching effects. In just a few short years, XML’s evolution has sparked an explosion of innovation that’s touched nearly every facet of computing, even the most basic computing building blocks, such as file systems, databases, displays, and communications. And it’s not done yet. It won’t be long before XML permeates nearly every system, application, and data store within reach. Think I’m exaggerating? Look at what XML has already accomplished.
At first, most developers equated XML with Web applications, because it looked like HTML. Some disparaged it as nothing but a bloated delimited text file. It’s true; XML is a bloated delimited text file, but with that bloat come five capabilities that more than justify the bloat and differentiate XML from delimited or fixed-width text files:
- XML is a delimited text file with standard and universal construction rules. Documents that adhere to these rules are said to be “well-formed.” The rules are extremely simple, which makes XML easy to parse. Any XML parser can parse any well-formed XML file.
- XML is hierarchical, and has the ability to carry not only regular, symmetrical data, such as values from database tables, but also irregular data, such as articles, books, and program objects.
- XML contains not only data, but also carries some information about that data, via the tags and hierarchical structure used in the file. Each tag is associated with a unique namespace, which ensures that common tags don’t get confused. For example, my
tag in my namespace isn’t the same as your tag in your namespace, even if we merge the documents. - XML schema (yes, I’m ignoring DTDs) extend the minimal meaning carried by the XML markup tags and namespaces. Schemas specify the allowable tag names, structure, and valid content of XML documents.
- Taken together, schema, namespaces, and tagged content give applications the ability to validate a document against a schema. Because that capability is built into validating XML parsers, developers can pass data between methods, applications, and organizations without having to write complex, error-prone validation code at each endpoint. Moreover, schema let applications automatically convert the text representation of a value stored in an XML file into a more useful typed data value, such as a date or object.
In contrast, plain text files don’t conform to either a standard or to universal rules, aren’t well-suited to hierarchical or irregular data, carry little or no information about the data they contain, and aren’t usually accompanied by schema, resulting in a wide variety of application-specific formats and forcing developers to write custom code to parse and validate the contents.
These five capabilities led to widespread changes for developers.
XML Supplants HTML
First, HTML, which had been undergoing rapid evolution, became obsolete, mutating into a fixed subset of XML called XHTML. That process is still under way, partially because HTML tools vendors didn’t support XHTML as fast as they should have, and partially because Web developers didn’t see immediate advantages in XHTML, so they continued (and some still continue) to write HTML instead. If there’s one bright spot in the Eolas patent lawsuit decision, it’s that companies will have to alter large numbers of these obsolete HTML pages and thus may finally get the message that they should start writing Web pages that are XHTML compliant.
XML Captures Configuration Files
Look at almost any modern application, and you’ll find XML configuration files that control its behavior. Applications often let administrators and users control some aspects of their behavior through external files, called configuration files. For example, an application may need a database connection string that controls where it should store data. A more complex application may need to assign different connection strings based on a user’s role within the application. Simple applications used simple text or INI (initialization) files, while more complex applications used proprietary text-based formats or binary files. INI files had size limitations and were unsuitable for storing deep hierarchical data. Proprietary files often required a custom interface; modifying them to accommodate changing application needs was difficult, and binary files weren’t human-readable. Modern applications use XML because it solves these problems: it consists of human-readable and modifiable text, supports deep hierarchical data, has a regular and verfiable structure, and accommodates structural changes easily.
XML Underlies Web Services
|
XML Marks Up GUIs
HTML defines user interfaces, but it’s a closed language, fixed by the W3C to a limited set of tags. That fixed nature acts as an unbreakable wall, any concept not already in HTML is not describable in HTML. Fortunately, unlike HTML, XML is not limited to specific tags; therefore, you can use it to describe any user interface. Products such as Mozilla (XUL) have already taken advantage of this, but XML-based UI descriptions will gain even more widespread penetration with the XAML UI description language built into Longhorn. Using XML, the same file can describe a UI implemented on any platform?yet one more example of the old adage that you can accomplish almost anything in programming by adding a layer of indirection.
XML Dominates Future File Systems
Since the early days of computing, file systems have consisted of hierarchical lists, composed of directories and files. Any directory can contain both files and other directories?a concept equivalent to “mixed-content” elements in XML, which can contain both data and other elements. But file systems work on a fixed attribute model, where each type of content has specific fixed attributes such as size, name, creation date, last modified date, last access date, etc. That fixed-attribute model is fast, but it greatly limits the meta-information that can accompany a file, and thus limits the way files can be organized. XML eliminates these restrictions by treating files as attachments to a customizable XML document.
It’s often extremely convenient to organize a file into several categories, or to provide comments that accompany the file, but aren’t integrated into the file itself. For example, assume you have a file that contains an article about XML and XSLT. With a simple hierarchical file system, you have to decide where to place the file, perhaps somewhere such as c:My DocumentsXML ArticlesXSLT. Worse, you have to remember where you’ve placed it to find it again?”Was that file in XML Articles or was it under Transformation Languages? Hmm”?or perform a full-text search to retrieve its location. Using an XML-based system, you can categorize the file in any number of categories, such as markup languages, transformation languages, XML, XSLT, Articles, etc., and then retrieve it using any of the associations. Further, an XML-based system has the potential to provide instant annotation capabilities for any file, not just those whose associated applications include annotation capabilities.
One Language to Bind Them All
The use of XML as a primary storage medium for code is increasing rapidly. I’m not talking about embedding code content in XML, I’m talking about XML as the representation; in other words, XML-based languages. XUL and XAML, along with several other display-oriented vocabularies started this trend, using XML to describe items to be drawn on-screen. It’s a short step to including code intended for off-screen consumption as well. That’s happening too. The Water programming language was first out of the gate as a commercial product. Unfortunately, simply transferring code storage from plain-text to XML-formatted files doesn’t create much value. All the current XML-based languages do exactly that, by storing high-level code statements in XML.
No, the real solution hasn’t yet reached fruition because it still slows down processing too much. The real solution is to put standardized primitives (variable declarations, assignments, IF structures, loop structures, function and method declarations, etc.) rather than finished 4GL code into the XML code documents. Doing it that way makes it possible to transform the code into nearly any language capable of expressing the primitive in its own higher-level syntax. For example, a loop, using an integer counter, can be represented generically and then processed into syntax and keywords appropriate to any language. Obviously, each language would need its own translation capabilities to and from its proprietary syntax, but the end result would save billions of dollars, by:
- Giving organizations a verifiable way to pass code into and out of the organization. Companies could use schema and intelligent code parsers, to determine what code does, whether it’s dangerous, and even what types of permissions that code would require to run properly.
- Freeing code from the restrictions of the original language. Hire a VB programmer to write some modules, and then later translate them into C++ and compile them with the rest of your application. Take your existing C++ DLL’s and translate them to Java code.
- Making it easy to migrate code from one system to another. By translating existing code into standard primitives, and then transforming it into a more modern language, you can avoid many of the problems associated with upgrading from one version to another, or translating from one language to another. This works much the same way as with old Word files. If you load an old Word file into Word 2003, you can save it as XML. Similarly, if you loaded an existing code file into an XML-enabled code translator, you could subsequently save it as XML.
- Increasing code reusability. While the world has made great strides in code reusability already, there’s far more progress to be made. If organizations could truly take advantage of code that’s already been written and debugged, they could save enormous amounts of money.
Would such a translation layer be perfect? No. There are reasons for different languages. Each has unique concepts and, to some degree, constructs. However, most programmers know that most code isn’t language-specific, and could therefore be translated quite readily. Would it be efficient? No. From a machine-performance standpoint, an XML representation would be far less efficient than 4GL code. But translation doesn’t necessarily need to be efficient. Developers wouldn’t work directly with the XML representation; therefore speed would matter only when saving or during compilation. As with other code, hardware performance improvements will eventually make such a representation fast enough.
OS?Who Cares? Give Me My XML
The ongoing discussion over which operating system is better for businesses and home users has become pointedly more heated lately, particularly the choice between the various Linux distributions, Windows, or the Mac OS, among others. Freedom of choice, compatibility, convenience, and cost?as well as widespread anger with Microsoft?are the key touchpoints in these debates.
XML is evolving in ways that will eventually make the choice of OS far less important. Operating systems are simply a managed environment to facilitate a few tasks: Running applications, storing applications and data, and displaying documents.
As you’ve seen, XML is set to fuel both file system (WinFS) and display (XAML, XUL) functionality in Windows. Similar capabilities for other OS’s are likely not far behind. If you can capture the application management, data storage, and UI behavior in XML, you’ve essentially created a layer that can be moved between operating systems much more easily.
As I’ve already discussed, XML-formatted configuration files increasingly hold directives, settings, preferences, and meta-data for individual applications, which means XML is already being used to perform one portion of application management. Applications also need data, and XML has made significant inroads into data storage, data transfer, and data query capabilities as well. Although relational databases remain the primary repository for enterprise and large-scale application data, modern applications that work with the data are retrieving it as XML. Microsoft’s DataSets in .NET are one small example. For more indications, one need look no further than the fact that all major databases can now deliver XML-formatted data, accept XML data for update and insert operations, and are rapidly gaining the ability to store and query (see XQuery) data in native XML format.
The essential point is this: Just as XML Web services provide a language-and-platform-independent layer between applications, XML configuration and management, data storage and display provide an equally language-and-platform-independent layer between operating systems. You’ll see the fruits of this added layer of indirection in years to come.
In the meantime, there’s a fairly simple way to stay abreast of the most important technological changes lurking in the middle distance: Keep a close eye on all advancements in XML and be prepared to modify your processes to comply with an XML-saturated technology future.