The Dos and Don'ts of XML
is a universal format for transferring hierarchically organized data. When properly formatted, XML documents are readable, self-explanatory, and platform-independent.
In practice, certain general XML features can be discarded: entities, DTD (Document Type Declaration, now mostly replaced by Schemas), and processing instructions (programmers generally keep their "processing instructions" in programs). The following are some tips for making the best use of these features.
DTD Doesn't Know What the App Needs
XML parsers check whether an XML document is well-formed, while DTD defines the document structure and can be used to check the semantic integrity of a document (whether a node has certain attributes, whether it is present, etc.). However, an application normally has its own semantic integrity criteria and knows better which pieces to throw away and whether to discard a document that does not contain the necessary data. DTD is defined by the producing side, which has no clue what the requirements are on the receiving side. (I'm not even going to discuss why the producing side would produce data inconsistent with its own specifications).
Don't Multiply Entities
XML entities are like C macros: they save typing time, but they are hell to maintain. Entities save some bytes, but that hardly makes any sense these daysat least for text files. Furthermore, using entities makes life harder on both the producing and receiving ends. There are only six entities that you cannot avoid:
With these six entities you will never need to use CDATA.
Order Does Not Matter
In a relational database, rows in a table have no specific order, and for good reason. Similarly, the subnodes of a node should not have any specific order. They do follow each other in the file, but semantically they are all created equal. Subnodes could and should be grouped into an unordered collection by the types specified in their start-tags and end-tags.
Imagine files in a JAR referencing the JAR, class members referencing the class, words referencing the sentence. Such weird structures shouldn't exist, right? Neither should subnodes have any knowledge of the nodes in which they are contained. You can deduce this information by looking at an XML file. The node/subnode relationship describes the container, not the contained node.
Keep It Simple
Without DTD, a file can be self-contained with no references to third parties. Because they have no order, you can easily move around and regroup subnodes without harming the contents. That is, if a node contains text, the text can be considered a single chunknot a collection of fragments as in HTML. In XML, you can always wrap chunks of text in subnodes if you want to split it.