RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Model XML to Please Humans and Computers Alike

Modeling XML documents is often a balancing act between human readability and extensibility. But you can build an XML schema that gives you the best of both worlds by following these five heuristics.

"euristics" is a word to inspire fear and loathing among the non-technical. But actually, it's one of the friendliest design-related concepts out there. This article suggests five heuristics for ensuring the human readability of your XML documents. These aren't iron-clad rules or complicated design patterns. So use them when you need to, and throw them out when you don't.

Why Bother?
That's the first question: When designing XML, why worry about humans at all? After all, SOAP makes few concessions to human readability, and it's a W3C standard.

The answer is: human authorship. SOAP is designed to be written and read by applications. But there are lots of XML dialects designed to be written by humans—XML Schema and XSLT spring to mind. Less ambitious examples of XML documents that need to be human-writable might include a complex config file, or an XML-based macro language.

Creating a dialect for human-authorship isn't easy, but XML has some big advantages, and the biggest is platform support. XML is so popular, there are so many class libraries, and so many XML-aware editors that I don't want to reinvent the wheel by writing my own parser and my own editing software. Working with XML gives you a lot of things for free (or at least at a bargain-basement price point). Here are just a few:

  • Syntax-checking based on an XML Schema
  • Searching whether programmatically or using XQuery
  • Processing with XSLT
  • Writing with one of the many good, cheap, XML-aware editors out there.

What You Need
Familiarity with XML and XML Schema and a good, XML-aware text-editor.

Introducing NoteML
As mentioned earlier, this article will teach you how to write XML that maximizes human readability. To do that, I'll create a sample application—an XML dialect to organize the output of my own human-authorship. I wanted a way to manage all of my writing in a single place but to emphasize human readability and writeability. In a typical day, I might write:

  • Personal e-mail
  • Work-related e-mail
  • Test cases
  • User stories
  • API documentation
  • Notes
  • Weblog entries

And maybe a whole lot of other stuff as well. The trouble is that, despite similarities in form and content, each of the categories above potentially has its own application and its own file format. That makes finding old pieces of writing difficult, since the search could involve several different applications—Outlook, Word, and TextEdit at the very least.

It's true that my penmanship is genuinely shameful, but there's another reason why I don't take notes on paper: computers make text-management easier. I call this dialect NoteML. There may be more practical ways to meet the same requirements. But an XML dialect for marking up notes does a good job of illustrating the tradeoffs between the needs of humans and the needs of software. This dialect should enable:

  • Searching. The trouble with notes is finding them months later when I actually need them. A proper model should make searching for relevant notes by date or by subject easier.
  • Processing. Email and blog entries don't do anyone any good if they just stay on my hard drive. Processing them with XSLT or application-logic should make it easy to get data out of NoteML and into a final format ready for consumption by the outside world.
  • Writing. At the very least, this format should be able to take advantage of an XML editor. But, at some point in the future, maybe I’ll get around to writing an application based on this format.

Listing 1 and 2 exemplify the two extremes. Listing 1 is a human-friendly format that requires a custom parser. Listing 2 shows well-formatted SOAP, which no person should really be expected to write in.

The requirements of NoteML are simple. Eventually, this format should be extensible enough to include all kinds of writing: e-mail, spec documents, and everything else. But for right now, I'll just worry about three types of entry:

  • Generic. Sometimes, I just want to write a simple text note, but I may not have a fancy text editor handy. Maybe it's a grocery list. Or maybe it's a great idea that came to me in the shower and I want to get it down before I forget.
  • Notes on a book or a Web site. If I take notes on a specific topic, NoteML should be able to catalog my reactions as well as the sources I used.
  • Blog entries. I should be able to write a format-neutral post in NoteML, and transform it into HTML using XSLT.
Author's Note: This article borrows its approach from Arthur J. Riel's classic design textbook, Object-Oriented Design Heuristics. If you haven't read it, check it out. The book is filled with fundamental advice that even the most seasoned designer occasionally forgets.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date