devxlogo

XML Documents

XML Documents

here are several steps to working with an XML document. This section focuses on the process of creating and tagging a document. You’ll need to follow the syntax rules of XML; if you know HTML they will feel quite familiar.

Before You Begin
There are a handful of terms you’ll be hearing as you work with an XML document. Take a couple of minutes to become familiar with them before you begin. Click on any of the terms to see its definition.

ElementAttribute&nbsp&nbsp&nbsp&nbsp&nbspTag
Attribute valueDeclaration&nbsp&nbsp&nbsp&nbsp&nbspDTD

The XML Document
An XML file is an ASCII text file with XML markup tags. It has a .xml extension, like this: booklist.xml

Inside an XML File
An XML file contains three basic parts:

  1. A declaration that announces that this is an XML file;
  2. An optional definition about the type of document it is and what DTD it follows;
  3. Content marked up with XML tags.

Click on this paragraph to see a very simple example of an XML document. Click on a part of the document to learn more about it.

Types of XML Documents
There are two types of XML documents: well-formed or valid. The only difference between the two is that one uses a DTD and the other doesn’t.

Well-formed
Well-formed documents conform with XML syntax. They contain text and XML tags. Everything is entered correctly. They do not, however, refer to a DTD.

Valid
Valid documents not only conform to XML syntax but they also are error checked against a Document Type Definition (DTD). A DTD is a set of rules outlining which tags are allowed, what values those tags may contain, and how the tags relate to each other.

Typically, you’ll use a valid document when you have documents that require error checking, that use an enforced structure, or are part of a company- or industry-wide environment in which many documents need to follow the same guidelines.

DTDs
A Document Type Definition (DTD) is a set of rules that defines the elements, element attribute and attribute values, and the relationship between elements in a document.

When your XML document is processed, it is compared to its associated DTD to be sure it is structured correctly and all tags are used in the proper manner. This comparison process is called validation and is is performed by a tool called a parser.

Remember, you don’t need to have a DTD to create an XML document; you only need a DTD for a valid XML document.

Here’s a few reasons you’d want to use a DTD:

  • Your document is part of a larger document set and you want to ensure that the whole set follows the same rules.
  • Your document must contain a specific set of data and you want to ensure that all required data has been included.
  • Your document is used across your industry and need to match other industry-specific documents.
  • You want to be able to error check your document for accuracy of tag use.

Deciding on a DTD
Using a DTD doesn’t necessarily mean you have to create one from scratch. There are a number of existing DTDs, with more being added everyday

Shared DTDs
As XML becomes wide-spread, your industry association or company is likely to have one or more published DTDs that you can use and link to. These DTDs define tags for elements that are commonly used in your applications. You don’t need to recreate these DTDs — you just point to them in your doctype tag in your XML file, and follow their rules when you create your XML document.

Some of these DTDs may be public DTDs, like the HTML DTD. Others may belong to your company. If you are interested in using a DTD, ask around and see if there is a good match that already exists.

Create Your Own DTD
Another option is to create your own DTD. The DTD can be very simple and basic or it can be large and complex. The DTD will be a reflection of the needs of your document.

It is perfectly acceptable to have a DTD with just four or five basic elements if that is what your document needs. Don’t feel that creating a DTD necessarily needs to be a huge undertaking.

However, if your documents are complex, do plan on setting aside time?several days or several weeks — to understand the document and the document elements and create a solid DTD that will really work for you over time.

Make an Internal DTDs
You can insert DTD data within your DOCTYPE definition. If you’re worked with CSS styles, you can think of this as being a little like putting style data into your file header. DTDs inserted this way are used in that specific XML document. This might be the approach to take if you want to validate the use of a small number of tags in a single document or to make elements that will be used only for one document.

Remember, the primary use for a DTD is to validate that the tags you enter in your XML document are entered as specified in the DTD. It is an error-checking process that ensures your data conforms to a set a rules.

XML Syntax
Tagging an XML document is, in many ways, similar to tagging an HTML document. Here are some of the most important guidelines to follow.

Rule #1: Remember the XML Declaration
This declaration goes at the beginning of the file and alerts the browser or other processing tools that this document contains XML tags. The declaration looks like this:

You can leave out the encoding attribute and the processor will use the UTF-8 default.

Rule #2: Do What the DTD Instructs
If you are creating a valid XML file, one that is checked against a DTD, make sure you know what tags are part of the DTD and use them appropriately in your document. Understand what each does and when to use it. Know what the allowable values are for each. Follow those rules. The XML document will validate against the specified DTD.

Rule #3: Watch Your Capitalization
XML is case-sensitive.

is not the same as

. Be consistent in how you define element names. For example, use ALL CAPS, or use Initial caps, or use all lowercase. It is very easy to create mis-matching case errors.

Also, make sure starting and ending tags use matching capitalization, too. If you start a paragraph with the

tag, you must end it with the

tag, not a

.

Rule #4: Quote Attribute Values
In HTML there is some confusion over when to enclose attribute values in quotes. In XML the rule is simple: enclose all attribute values in quotes, like this:

Ben Johnson

Rule #5: Close All Tags
In XML you must close all tags. This means that paragraphs must have corresponding end paragraph tags. Anchor names must have corresponding anchor end tags. A strict interpretation of HTML says we should have been doing this all along, but in reality, most of us haven’t.

Rule #6: Close Empty Tags, Too
In HTML, empty tags, such as
or , do not close. In XML, empty tags do close. You can close them either by adding a separate close tag () or by combining the open and close tags into one tag. You create the open/close tag by adding a slash, /, to the end of the tag, like this:


Examples
This table shows some HTML common tags and how they would be treated in XML.

TagCommentEnd-Tag


Technically, in HTML, you’re supposed to close this tag. In XML, it’s essential to close it.

All Elements in XML must have a Start-tag and an end-tag.
  • This tag must be closed in XML in order to ensure a Well-Formed XML document.
    META tags are considered empty elements in XML, and they must close.>

    Break tags are considered empty elements.
    This is an empty element tag.

    Well-formed XML
    A document that conforms to the XML syntax rules is called “well-formed.” If all your tags are correctly formed and follow XML guidelines, then your document is considered a well-formed XML document. That’s one of the nice things about XML?you don’t need to have a DTD in order to use it.

    Begin the Well-formed Document
    To begin a well-formed document, type the XML declaration:

    If you are embedding XML, it will go after the and tags, and before any Javascript.

    If you are creating an XML-only document, it will be the first thing in the file.

    Version
    You must include the version attribute for the XML declaration. The version is currently “1.0.” Defining the version lets the browser know that the document that follows is an XML document, using XML 1.0 structure and syntax.

    Standalone
    The next step is to declare that the document “stands alone.” The application that is processing this document knows that it doesn’t need to look for a DTD and validate the XML tags.

    Encoding
    Finally, declare the encoding of the document. In this case, the encoding is UTF-8, which is the default encoding for XML. You can leave off this attribute and the processor will default to UTF-8.

    Remember the Root Element
    After the declaration, enter the tag for the root element of your document. This is the top-most element, under which all elements are grouped.

    Follow XML Syntax
    Now, enter the rest of the your content. Remember to follow XML syntax:
    • Remember that capitalization matters;
    • Quote all attribute values;
    • Close all tags;
    • Remember to close empty tags too, like this:


    Pretty easy, isn’t it? That’s all there is to it!

    Valid XML
    A valid document conforms to the XML syntax rules and follows the guidelines of a Document Type Definition (DTD).

    The process of comparing the XML document to the DTD is called validation. This process is performed by a tool called a parser.

    Begin the Valid XML Document
    To begin a well-formed document, type the XML declaration:

    If you are embedding XML, it will go after the and tags, and before any Javascript.

    If you are creating an XML-only document, it will be the first thing in the file.

    Version
    You must include the version attribute for the XML declaration. The version is currently “1.0.” Defining the version lets the browser know that the document that follows is an XML document, using XML 1.0 structure and syntax.

    Standalone
    The standalone=”no” attribute tells the computer that it must look for a DTD and validate the XML tags.

    Encoding
    Finally, declare the encoding of the document. You can leave off this attribute and the processor will default to UTF-8.

    Create a DOCTYPE Definition
    The second element in a valid XML document is the DOCTYPE definition. This identifies the type of document and DTD in use.

    If you look at HTML source files, you’ll often see a !DOCTYPE definition, especially if the file was created by a WYSIWYG tool. The DOCTYPE definition points to an HTML DTD.

    In a valid XML file, !DOCTYPE tells the program that is processing your XML file two things: the name of the type of document and the name and location of the DTD against which to validate the file’s contents.

    The DOCTYPE definition looks like this:

    !DOCTYPE
    This says that you are defining the DOCTYPE.

    type-of-doc
    This is the name of the type of document contained in this file. Typically, this is the same name as the DTD.

    SYSTEM/PUBLIC
    SYSTEM tells the processor to look for the private DTD at the following location. PUBLIC tells the processor to look for a public DTD at the following location.

    “dtd-name”
    The URL after SYSTEM or PUBLIC is the name of the dtd file. All DTDs end with the extension .dtd.

    If you want, instead of pointing to an external DTD, you could place the DTD information within the DOCTYPE definition, making it local to your XML document. You should do this only if you want to define a few simple elements and you want them permanently attached to a particular document.

    Remember the Root Element
    After the declaration, enter the tag for the root element of your document. This is the top-most element, under which all elements are grouped.

    Follow XML Syntax
    Now, enter the rest of the your content. Remember to follow XML syntax:
    • Remember that capitalization matters;
    • Quote all attribute values;
    • Close all tags;
    • Remember to close empty tags too, like this:


    Elements
    Elements are the basic building blocks of XML (and HTML, for that matter). Each element is a piece of data, identified by a tag. The tag contains the name of the element and any of its attributes, like this:

    Thadius J. Frog

    Thadius J. Frog is now identified as an author element. This particular author element as a date of birth (dob) attribute value of 1864.

    Chose Your Own
    XML is an extensible markup language. This means you create a set of elements that work for your content?and that you’ll be able to use consistently within the document.

    Whether you use a DTD or not, you’ll still want to sit down and write a list of the element names that you will be using in your document. XML is case-sensitive, so as you’re thinking about the element names, be sure the think about how you capitalize them also.

    Select names that are both easy to rememberer and easy to type. Ideally, your tags should have some inherent meaning too. This makes them easier to use. For example, if you want to identify “last name” as an element, consider naming the element something like “last-name” or “surname.”

    Be consistent in your use of names. It is easier to apply one set of general rules to 20 different tags than it is to remember eight discrete tags that follow no particular pattern. For example, if your document is a listing of classes, you could use these elements:





    But you’re just asking for confusion!

    There’s a mix of capitalization. There’s a mix of abbreviation and full words. In one case the phrase “name” is the first part of tag; in the other it is the second part of the tag. It isn’t logical to remember this set of names.

    Wouldn’t names like this be easier to use?




    Theses names are all lowercase, full words, no plurals?and easy set of criteria to remember.

    Focus on Structure, Not Format
    One of the goals of using XML is to separate structure (“this is an author”) from format (“display this in 10 point Helvetica”). Elements remain identified as elements, no matter what platform you move the data to. An XML document is completely interpretable.

    When you think about elements, think about the role they play and the data they contain. Don’t think about how the elements will look on the page. Appearance is handled separately.

    You are using elements to identify data within your document as playing a certain role or belonging to a certain category of data.

    Displaying Elements
    You can use any tag name you want, as long as you follow proper XML syntax. Of course, those tags alone won’t do anything. They will just sit there quietly, marking up your data.

    After you data is marked up, you’ll use style sheets or other processing tools to display the XML document. You can control the display based on information contained in the elements.

    Using Elements
    In a well-formed XML document, you can insert any element tag you want, as long as you follow proper syntax.

    In a valid XML document, only the elements which are specified in the DTD will pass muster. If you randomly add other elements, their use will be flagged as an error.

    When you use elements in an XML document, you must follow standard XML syntax:

    • The element name surrounds the data which it defines. For example: Tying Knots.
    • All elements, including empty elements, must end. This means having an open and close tag for regular elements and a tag that closes with a slash for empty elements.
    • The element name is case sensitive: ,is not the same as .
    DTDs and Elements

    One of the ways to define and codify all your elements is to create a DTD. A DTD defines the allowable elements, their attributes (if any) are, and their relationship is to other elements.

    By validating your XML document against a DTD, you can test to be sure that elements in the documents are being used correctly.

    Attributes
    Attributes provide additional information about elements.

    You use elements and attributes all the time in HTML. For example, in HTML, a tag such as

    includes an element: H1, and an attribute: align and an attribute value: center.

    In HTML, attributes allow you to specify additional information about your elements. Often this information is formatting-related, such as align or size. In XML, attributes allow you to specify additional data about an element, but it is never formatting-related. It is, instead, additional data about that particular element.

    Let’s say, for example, you’re creating documents about late 20th century popular music. In your DTD you’ve created an element called which identifies each musical title. You have music that falls into different decade categories — the 70’s, the 80’s and the 90’s. You can give the song element an attribute called era. Now, you’ll be able to know from what era each song dates.

    By using an attribute, you can identify different versions of the same song — “I’ve Got You Babe” from the 1960s and “I’ve Got You Babe” from the 1980s. Later on, you can use this data to display all 70s songs in green, or to sort the displayed titles by era.

    You would use the attribute like this:

    I’ve Got You Babe

    Billy Don’t Be a Hero

    I’ve Got You Babe

    “I’ve Got You Babe” is identified as a “song” element with an “era” attribute value of “60s”. “Billy Don’t Be A Hero” is identified as a “song” element with an “era” attribute value of “70s”. “I’ve Got You Babe” is identified as a “song” element with an “era” attribute value of “80s”.

    Attributes and their allowable values are created in your DTD, when you specify elements. They are specified through an attribute list. Like element names, attribute names are case-sensitive, so be aware of your use of capitalization when you select and use attribute names.

    One other important thing to remember about attributes in XML tags is that the attribute values must always be contained inside quotes. In HTML it’s a mixed bag, but in XML the rule is easy to remember: quote all attribute values.

    Comments
    Comments are a way to add your own notes to an XML document. The browser and the XML processors will ignore anything inside comments.

    You aren’t going to remember what you were thinking three months later when you return to edit the document, so don’t be afraid to add comments as reminders or as markers of work that you have done.

    To create a comment:

    1. Type a less than sign, followed by an exclamation point and two dashes like this:

    CDATA
    CDATA stands for “character data.” Character data are letters, numbers, and other symbols that are used exactly as they are typed. They are not parsed or processed, or treated as if they have any special meaning.

    You can create a CDATA section within your XML document. A CDATA section is handy way to show code examples or to use characters, such as > that would otherwise take on a special meaning. You can use CDATA instead of using a series of <, for example.

    To create a CDATA section:

    1. At the place in the document where you want the CDATA section to appear, begin a CDATA definition with the less than sign and an exclamation point.

    2. Type an open square brace and the letters CDATA.

      [CDATA

    3. Type another open square brace.

      [

    4. Now type the CDATA itself. In this example, we are typing some sample code.

      Sir Fredrick of Ledyard’s End

    5. End the section with two closing square bracket and a greater than symbol.

      Sir Fredrick of Ledyard’s End]]>

    Click anywhere on this code to see how it would be displayed in a browser, assuming of course, that it is linked to a stylesheet:


    Entering a Kennel Club Member



    Enter the member by the name on his or her papers. Use the NAME tag. The NAME tag has two attributes. Common (all in lowercase, please!) is the dog’s call name. Breed (also in all lowercase) is the dog’s breed. Please see the breed reference guide for acceptable breeds. Your entry should look something like this:



    Sir Fredrick of Ledyard’s End]]>

    Namespaces
    Namespaces are a way of using elements from more than one DTD within the same XML document.

    Sometimes you may be working with material that draws on several sets of element tags. For example, you might have an online store selling tropical fish and you’d like to use the tag to identify both the geographic location from which each species comes and the wholesaler from whom you buy it. Namespaces are a way to do this.

    An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names. In practice, namespaces let you match a tag you are using with a particular set of tags.

    In the beginning of your document (or at the start of a particular element of your document), you identify the namespaces you’ll be using and where the tag information is located. Then, when you use the tag to identify an element in your document, you precede it with the appropriate namespace name.

    Declaring Namespaces
    At the beginning of your document, you’ll want to identify the namespaces you are using in your document. This process is called declaring the namespace. In this example, you are creating a namespace called “sales.” The URI for sales is the mythical fishworld.org/schema:

    Using Namespaces
    When you use the tag to create the element that is defined in one of the namespaces, the namespace is the first part of the tag, like this:

    Fish-o-Rama Wholesalers and Suppliers to the Trade

    When you use your own tag you just use the tag name, like this:

    Mexico, Central America

    In January 1999, Namespaces became a W3C recommendation.

    XML Entities
    An entity is a short cut to a set of information.

    When you use an entity, it “expands” to its full meaning, but you need only type the shorter entity name during data entry. You might think of an entity as being a bit like a macro?it is a set of information that can be used by calling one name.

    XML defines two types of entities. The general entity, which we’ll talk about here, is used in XML document. The parameter entity is used in DTDs. General entities are easy to spot: they begin with the ampersand and end with the semicolon, like this:

    &entity-name;

    Uses for Entities
    Entities are a way to make entering and managing data easier.

    You’ve probably already used entities without calling them that. If you’ve ever entered the characters < to create the < symbol, you've used an entity. This keystroke combination is a standard predefined entity in both HTML and XML that lets you access a particular ascii character without having to memorize the character set number.

    Here are a few reasons you might want to define and use entities:

    • Entities save typing. Suppose you have a paragraph, like a copyright notice, that you use in every single document. You could type that notice over and over again. Or, you could use an entity to call it forth in place.
    • Entities can reduce errors. By the 101st time you type that copyright notice, it is likely your poor fingers will be so tired you’ll make an error and set your copyright for 1989 instead of 1999. Using an entity can reduce the potential for these types of errors.
    • Entities are easy to update. It is time to update that copyright notice?with an entity you can make the change in one place and be done with it. Without an entity you’d be searching and replacing throughout your document set.
    • Entities can act as placeholders for TBD information. Maybe legal hasn’t quite finalized what they want that copyright notice to say. That doesn’t have to stop production?you can use and entity and when the final wording comes down, the entity will automatically display the new, corrected version in all your documents.

    You can get quite creative with the use of entities, and even have documents that are constructed entirely from entities. Here’s an example:

    You want to create different documents, each contains a set of bios for members of your staff. You’ll have an executive set, a set for each product line, a set for six different regions around the world … subsets of the same content appears in each.

    One approach you could take is creating 10 or 12 separate flat files, with the appropriate biography information into each. But an easier way is to create a small file for each bio, then call each into the executive page, the European page, the Flying Toys Division page and so on via an entity.

    Here’s how the content code for your Flying Toys Division Page might look. Upon display, the entities would expand and you’d see the full bios of each person. If you needed to change the bios, you could do it in one place. If the product manager changed, all your pages would be automatically updated with the new person.

    Click anywhere in the code to see how it might expand into a displayed document:

    The Faces Behind Flying Toys!
    &bio-ft-div-head;
    &bio-ft-prod-mgr;
    &bio-ft-designer;
    &bio-ft-lead-engineer;

    Defining Entities
    You can define entities in your local document as part of the DOCTYPE definition. You can also link to external files that contain the entity data. This, too, is done through the DOCTYPE definition. A third option is to define the entities in your external DTD.

    Use a local definition when the entity is being used only in this one particulars file. Use a linked, external file when the entity being used in many document sets.

    To define an entity:

    1. Start your DOCTYPE definition as usual, like this:

    2. Now mark that you are defining some data by entering a square bracket:

    3. Start the entity definition, with a less than sign, an exclamation mark, and the phrase ENTITY, all in caps:

    4. Type the name of the entity. Type it using the capitalization that you will use when calling it later on.

    5. If you are defining the entity locally, type the value of the entity, surrounded by quotes, and then close the entity definition with a greater than sign.

    6. If you are defining an entity in an external, ascii text file, put in a pointer to the external file, then close the entity definition with a greater than sign.

    7. Create all your entity definitions. When you are done, close the DOCTYPE definition with a square brace and a greater than sign.



      ]
      >

    Using Entities

    To use an entity in your document, just call it by name. The name begins with an & and ends with a semi-colon.

    Click anywhere on this code to see how it would display, assuming of course, that it was linked to a style sheet.




    ]
    >



    Mini-globe revolutionizes keychain industry


    Today As The World Spins introduces a new approach to key chains. With the new MINI-GLOBE keys can be kept inside a chain, called for upon demand, and stored safely. Never more will consumers lose a key or stand at a door flipping through a stack of keys seeking the right one.


    &trademark;
    &copyright;

    devxblackblue

    About Our Editorial Process

    At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

    See our full editorial policy.

    About Our Journalist