XML Reference

agging an XML document is, in many ways, similar to tagging an HTML document. Here are some of the most important guidelines to follow.

Rule #1: Remember the XML declaration
This declaration goes at the beginning of the file and alerts the browser or other processing tools that this document contains XML tags. The declaration looks like this:

You can leave out the encoding attribute and the processor will use the UTF-8 default.

Rule #2: Do what the DTD instructs
If you are creating a valid XML file, one that is checked against a DTD, make sure you Know what tags are part of the DTD and use them appropriately in your document. Understand what each does and when to use it. Know what the allowable values are for each. Follow those rules. The XML document will validate against the specified DTD.

Rule #3: Watch your capitalization
XML is case-sensitive.

is not the same as

. Be consistent in how you define element names. For example, use ALL CAPS, or use Initial caps, or use all lowercase. It is very easy to create mis-matching case errors.

Also, make sure starting and ending tags use matching capitalization, too. If you start a paragraph with the

tag, you must end it with the

tag, not a

.

Rule #4: Quote attribute values
In HTML there is some confusion over when to enclose attribute values in quotes. In XML the rule is simple: enclose all attribute values in quotes, like this:

Ben Johnson

Rule #5: Close all tags
In XML you must close all tags. This means that paragraphs must have corresponding end paragraph tags. Anchor names must have corresponding anchor end tags. A strict interpretation of HTML says we should have been doing this all along, but in reality, most of us haven’t.

Rule #6: Close Empty tags, too
In HTML, empty tags, such as
or , do not close. In XML, empty tags do close. You can close them either by adding a separate close tag () or by combining the open and close tags into one tag. You create the open/close tag by adding a slash, /, to the end of the tag, like this:


Examples
This table shows some HTML common tags and how they would be treated in XML.

Tag Comment End-Tag


Technically, in HTML, you’re supposed to close this tag. In XML, it’s essential to close it.

All Elements in XML must have a Start-tag and an end-tag.
  • This tag must be closed in XML in order to ensure a Well-Formed XML document.
    META tags are considered empty elements in XML, and they must close.

    Break tags are considered empty elements.
    This is an empty element tag.

    Element and Attribute Rules

    The first table contains the basic guidelines for creating element rules in an XML DTD.

    The second contains attribute value types.

    The third contains attribute default options.

    Element Rules:

    Symbol

    Meaning

    Example

    #PCDATA

    Contains parsed character data, or text.

    The POW element contains textual data.

    #PCDATA, element-name

    Contains text and another element. #PCDATA is always listed first in a rule.

    The POW element must contain both text and the NAME element.

    ,
    (comma)

    Use in this order

    The POW element must contain the NAME element, followed by the RANK element, followed by the SERIAL element.

    |
    (bar)

    Use either or

    < POW(NAME | RANK | SERIAL)>

    The POW element must contain either the NAME element, or the RANK element, or the SERIAL element.

    name
    (by itself)

    Use one time only

    The POW element must contain the NAME element, used exactly one time.

    name?

    Use either once or not at all

    The POW element must contain the NAME element used exactly oncee, followed by one or none RANK elements, and one or none SERIAL elements.

    name+

    Use either once or many times

    The POW element must contain at least one but maybe more NAME elements, followed by one or none RANK elements, and exactly one SERIAL elements.

    name*

    Use once, use many times, or don’t use it at all.

    The POW element must contain at one, many, or none NAME elements, followed by one or none RANK elements, and exactly one SERIAL elements.

    ( )

    Indicated groups, may be nested.

    The POW element contains one more use uses of either or both text and the NAME element.

    The POW element must contain many instances of the group that contains one, many, or none NAME elements, followed by one or none RANK elements, and exactly one SERIAL elements. OR, it may contain one COMMENT element.

    The POW element must contain a NAME or RANK element. The NAME or RANK option may appear once or may be repeated many times.

    Attribute Values:

    Type

    Meaning

    Example

    CDATA

    Character data, text.

    CDATA #REQUIRED>

    The COMMENT element has an attribute named category. This attribute contains letters, numbers, or punctuation symbols.

    NMTOKEN

    Name token, text with some restrictions. The value contains number and letter. However, it cannot begin with the letters “xml” and the only symbols it can contain are _, -, ., and :..

    NMTOKEN #REQUIRED>

    The COMMENT element has an attribute named category. This attribute contains a name token.

    (value-1 | value-2 | value-3)
    value list

    A value list provides a set of acceptable options for the attribute to contain. In general, you should always include “other” as one of the options.

    (red | green | blue | other) “other”>

    The COMMENT element has an attribute named category. The category can be “red,” “green,” “blue,” or “other.” The default value is “other.”

    ID

    The keyword ID means that this attribute has an ID value that idenifies this particular element.

    ID #IMPLIED>

    The COMMENT element has an attribute named category. The category will contain an ID value. ID and IDREF work together to create cross-references.

    IDREF

    The keyword IDREF means that this attribute has an ID reference value that points to another instance’s ID value.

    IDREF #IMPLIED>

    The COMMENT element has an attribute named category. The category will contain an IDREF value. ID and IDREF work together to let you do cross-reference elements.

    ENTITY

    The keyword ENTITY means that this attribute’s value is an entity. An entity is a value that has been defined elsewhere in the DTD to have a particular meaning.

    ENTITY #IMPLIED>

    The COMMENT element has an attribute named category. The category will contain an entity name rather than text.

    NOTATION

    The keyword NOTATION means that this attribute’s value is a notation. A notation is a description of how information should be processed. You could set up a notation that allows only numbers to be used for the value, for example.

    NOTATION #IMPLIED>

    The COMMENT element has an attribute named category. The category attribute will contain a notation name.

    Attribute Default Options:

    TypeMeaningExample
    #REQUIRED
    The attribute must always be included when the element is used. #REQUIRED>

    The COMMENT element has an attribute named category. This attribute contains letters, numbers, or punctuation symbols. The attribute must always be used with the element. If you omit the attribute, the parser will give you an error message.
    #IMPLIEDThe attribute is optional. If you see the keyword #IMPLIED, you know that this attribute will be ignored unless it is included in the element tag. It won’t take on any default values. #IMPLIED>

    The COMMENT element has an attribute named category. You may use the attribute or omit the attribute, as the instance requires.
    #FIXEDThe attribute is optional, but if it is used, it must always have a certain value. If you see the keyword #FIXED, you know that this attribute will always have the specified value when it is entered.

    #FIXED “yes”>

    The COMMENT element has an attribute named confirm. If it is used, its value will be “yes.” If it is not used, it will not have a value.
    “value”A value in quotes is the default value of this attribute. If you don’t enter the attribute in the element tag, the processor will assume the attribute has this default value. “other”>

    The COMMENT element has an attribute named category. If you don’t use the attribute in the element tag, the attribute will automatically receive the value “other.”

    Interaction Between Components

    XML, CSS, script, the DOM, and the browser work together to let you create interactive presentations of your content.

    component interaction

    XML Parsers
    Parsing is the process of checking the syntax of your document and creating the “tree structure.” If you are using a validating parser, the process will also compare the XML file to its DTD.

    On-line Parsers
    There are a number of online parsers. To use these, you typically type in the URI of your file and tell the process to begin.

    Downloadable Parsers
    There are many parsers that you can download and run on your local machine. Most of these require you to have either a Windows or UNIX machine. They are written in a variety of langauges; this is a cross section of some of the many which are available.

    XML-related Technologies

    Markup
    HTML
    SGML
    XHTML
    XML

    Style
    CSS
    DHTML
    DSSSL
    HTC
    XSL

    Structure
    DTD
    DDML
    DCD
    X-Schema
    XML-Data

    Processing
    DOM
    XML-NS

    Linking
    XLink
    XPointer

    Query
    XQL
    XML-QL

    Metadata
    RDF

    XML Applications & Public DTDs
    AdMarkup
    HTML
    ICE
    Math ML
    SMIL
    XFDL

    XML Resources
    There are many resources about XML out there on the Web. Here are a few of our favorites.

    Specifications
    • Tim Bray’s Annotated XML Spec
      You’ll want to read the XML spec, but the best way to do so is with a helping hand beside you. That’s exactly what Tim Bray has created in this very useful dual-frame presentation of the spec. And if anyone knows the inside scoop on the spec, it is Tim, who was one of the key people in its development.
    • W3C XML Recommendation
      This is the XML Recommendation in its full form.

    Additional Resources
    There are a range of trade groups and publications that focus on XML issues. Here are few that are most useful for XML in a Web design/development context.

    XML Vendors
    There are a number of companies working within the XML tools space. This section contains links to company information.

    Project Cool does not endorse these products; the items included here are some of the offerings on the market today or companies who are developing for the marketspace.

    If you have a machine capable of it, however, we do recommend you try downloading IE5 and Gecko and take a look at some of the XML demos, to get a sense of how the XML plus CSS feels in a real browser.

    XML-capable browsers
    The 5.X browsers support XML documents.

    • IE 6.X
      Now in public release and available for download.
    • Gecko
      The layout engine that is part of Netscape Navigator.

    Document Authoring Tools
    If there is a weakness to implementing large-scale XML projects it is the lack of good authoring tools. Handcoding is possible, of course, but structured editors make the task much easier and more error-free. These companies are working on XML document authoring products. They are listed in alphabetical order.

    • Adobe – Framemaker plus SGML is being adapted for use as an XML editor as well.
    • Arbortext – This SGML tools company is leaping into the XML game.
    • Macromedia – The popular Dreamweaver tool is supporting the creation of XML documents.
    • Corel XMetaL

    XML and Microsoft Office 2000
    We see frequent quuestions on the relationship between Office 2000 and XML. Our research shows that there is a relationship, but it isn’t quite what rumor holds it to be:

    Microsoft is an active supporter of XML and various XML initiatives. It is also incorporating XML support and XML structure in its various products.

    Microsoft is a perfect example of a company that needs – as an end user – a solution like XML. It has data that needs to move across different platform, without losing its meaning.

    One place this is very obvious is in its Office suite. Its customers want to move data between applications and also to share data with other users who may or may not be using the same applications or the same versions of applications.

    Remember, XML is a Markup Language. All a markup language does is identify pieces of a document so that another application can do something with those pieces. All word processors have a markup language. In early days of text processing, WordStar, XyWrite, and Word Perfect used to let you see and edit their markup code; Word and MacWrite usually didn’t.

    Traditionally, markup languages were specific to an application. But what if you want to see a document and don’t own the exact same application in which it was created? There’s always ASCII, but that strips out most of the meaning. So we saw the rise of interchange formats. For text, Microsoft turned to Rich Text Format (.rtf) as its solution. The .rtf format provided a structure for opening up, say, a Word/Macintosh file in a Word/PC program or a Word Perfect program, but it was hardly the ideal solution.

    Three years ago, Microsoft decided an emerging markup language called XML, in combination with HTML and CSS, provided a better option for marking up data.

    The goal, says Marc Olson, Microsoft Group Program Manager, was “to use HTML to make Office documents universally viewable by anyone with a browser on any platform. Embedded XML tags are used as a means for Office to re-open the HTML document with no loss of information or quality.” Microsoft is using the phrase “document round-tripping” to describe this back and forth process.

    Given the seemingly endless buzz around both XML and Microsoft, word on the street was that Office 2000 was to be an XML application. This notion quickly segued into a series of questions and complaints that Office 2000 doesn’t “support” XML correctly. As with many other XML stories, there’s a bit of truth and a lot of confusion in the XML/Office 2000 relationship.

    Microsoft is itself using an application of XML to create to underlying structure for the data in Office 2000 documents. It is not offering up Word as an XML editing tool or a means of creating well-formed documents. It is not saying that its XML application is a “standard.” Rather, the Office 2000 example is a good case study of how a company can apply the extensibility and metadata capabilities of XML to find a solution for itself. In the Office 2000 case, Microsoft is the “customer” of the technology, not the vendor.

    “Creating a standard format wasn’t the goal for Office 2000” says Olson. “It was to find a way to let people view a document, regardless of whether they own Program X. If they have a browser that supports HTML, they can view any Office 2000 file. HTML provides the viewing framework while XML provides the framework for data stored within that HTML document.”

    Office uses XML in a very specific way?to structure the non-viewable contents of Word, PowerPoint, and Excel files. It has developed a set of tags and a data schema that defines the Office 2000 document set, much as you or I might create a set of tags and data schema for our “Flying Widget documentation set” or our inventory of tropical fish.

    Within the Office 2000 document is a namespace tag that identifies the schema. When a browser or other program that can display HTML sees the Office 2000 document, it uses the schema and its associated style information to first process and then display the document. When Office200 applications open a document, they use the schema to access the underlying XML data structure.

    Some schemas are private, while other others are publicly published. Microsoft has published its Office 2000 documents at http://www.msdn.microsoft.com/library. You have to navigate a little to get to it but the material is under Office Developer Documentation, Office 2000 Documentation, Microsoft Office HTML and XML reference.

    So does Office 2000 use XML? Yes and no. It is an example of how one document set applies XML to meet a specific goal and that’s very exciting. But it isn’t the magic XML bullet – it is an application that shows that the “extensible” concept does indeed live up to its name.


    Share the Post:
    Share on facebook
    Share on twitter
    Share on linkedin

    Overview

    Recent Articles: