here are several steps to working with an XML document. This section focuses on the process of creating and tagging a document. You’ll need to follow the syntax rules of XML; if you know HTML they will feel quite familiar.
Before You Begin
There are a handful of terms you’ll be hearing as you work with an XML document. Take a couple of minutes to become familiar with them before you begin. Click on any of the terms to see its definition.
Element | Attribute |      Tag |
Attribute value | Declaration |      DTD |
The XML Document
An XML file is an ASCII text file with XML markup tags. It has a .xml extension, like this: booklist.xml
Inside an XML File
An XML file contains three basic parts:
- A declaration that announces that this is an XML file;
- An optional definition about the type of document it is and what DTD it follows;
- Content marked up with XML tags.
Types of XML Documents
There are two types of XML documents: well-formed or valid. The only difference between the two is that one uses a DTD and the other doesn’t.
Well-formed
Well-formed documents conform with XML syntax. They contain text and XML tags. Everything is entered correctly. They do not, however, refer to a DTD.
Valid
Valid documents not only conform to XML syntax but they also are error checked against a Document Type Definition (DTD). A DTD is a set of rules outlining which tags are allowed, what values those tags may contain, and how the tags relate to each other.
Typically, you’ll use a valid document when you have documents that require error checking, that use an enforced structure, or are part of a company- or industry-wide environment in which many documents need to follow the same guidelines.
DTDs
A Document Type Definition (DTD) is a set of rules that defines the elements, element attribute and attribute values, and the relationship between elements in a document.
When your XML document is processed, it is compared to its associated DTD to be sure it is structured correctly and all tags are used in the proper manner. This comparison process is called validation and is is performed by a tool called a parser.
Remember, you don’t need to have a DTD to create an XML document; you only need a DTD for a valid XML document.
Here’s a few reasons you’d want to use a DTD:
- Your document is part of a larger document set and you want to ensure that the whole set follows the same rules.
- Your document must contain a specific set of data and you want to ensure that all required data has been included.
- Your document is used across your industry and need to match other industry-specific documents.
- You want to be able to error check your document for accuracy of tag use.
Deciding on a DTD
Using a DTD doesn’t necessarily mean you have to create one from scratch. There are a number of existing DTDs, with more being added everyday
Shared DTDs
As XML becomes wide-spread, your industry association or company is likely to have one or more published DTDs that you can use and link to. These DTDs define tags for elements that are commonly used in your applications. You don’t need to recreate these DTDs — you just point to them in your doctype tag in your XML file, and follow their rules when you create your XML document.
Some of these DTDs may be public DTDs, like the HTML DTD. Others may belong to your company. If you are interested in using a DTD, ask around and see if there is a good match that already exists.
Create Your Own DTD
Another option is to create your own DTD. The DTD can be very simple and basic or it can be large and complex. The DTD will be a reflection of the needs of your document.
It is perfectly acceptable to have a DTD with just four or five basic elements if that is what your document needs. Don’t feel that creating a DTD necessarily needs to be a huge undertaking.
However, if your documents are complex, do plan on setting aside time?several days or several weeks — to understand the document and the document elements and create a solid DTD that will really work for you over time.
Make an Internal DTDs
You can insert DTD data within your DOCTYPE definition. If you’re worked with CSS styles, you can think of this as being a little like putting style data into your file header. DTDs inserted this way are used in that specific XML document. This might be the approach to take if you want to validate the use of a small number of tags in a single document or to make elements that will be used only for one document.
Remember, the primary use for a DTD is to validate that the tags you enter in your XML document are entered as specified in the DTD. It is an error-checking process that ensures your data conforms to a set a rules.
XML Syntax
Tagging an XML document is, in many ways, similar to tagging an HTML document. Here are some of the most important guidelines to follow.
This declaration goes at the beginning of the file and alerts the browser or other processing tools that this document contains XML tags. The declaration looks like this:
You can leave out the encoding attribute and the processor will use the UTF-8 default.
Rule #2: Do What the DTD InstructsIf you are creating a valid XML file, one that is checked against a DTD, make sure you know what tags are part of the DTD and use them appropriately in your document. Understand what each does and when to use it. Know what the allowable values are for each. Follow those rules. The XML document will validate against the specified DTD.Rule #3: Watch Your Capitalization
XML is case-sensitive.
is not the same as
. Be consistent in how you define element names. For example, use ALL CAPS, or use Initial caps, or use all lowercase. It is very easy to create mis-matching case errors.
Also, make sure starting and ending tags use matching capitalization, too. If you start a paragraph with the
tag, you must end it with the
tag, not a.
Rule #4: Quote Attribute ValuesIn HTML there is some confusion over when to enclose attribute values in quotes. In XML the rule is simple: enclose all attribute values in quotes, like this:
In XML you must close all tags. This means that paragraphs must have corresponding end paragraph tags. Anchor names must have corresponding anchor end tags. A strict interpretation of HTML says we should have been doing this all along, but in reality, most of us haven’t.Rule #6: Close Empty Tags, Too
In HTML, empty tags, such as
or , do not close. In XML, empty tags do close. You can close them either by adding a separate close tag () or by combining the open and close tags into one tag. You create the open/close tag by adding a slash, /, to the end of the tag, like this:
This table shows some HTML common tags and how they would be treated in XML.
Tag | Comment | End-Tag |
Technically, in HTML, you’re supposed to close this tag. In XML, it’s essential to close it. | ||
All Elements in XML must have a Start-tag and an end-tag. | ||
This tag must be closed in XML in order to ensure a Well-Formed XML document. | ||
META tags are considered empty elements in XML, and they must close. | > | |
Break tags are considered empty elements. | ||
This is an empty element tag. |
Well-formed XML
A document that conforms to the XML syntax rules is called “well-formed.” If all your tags are correctly formed and follow XML guidelines, then your document is considered a well-formed XML document. That’s one of the nice things about XML?you don’t need to have a DTD in order to use it.
To begin a well-formed document, type the XML declaration:
If you are embedding XML, it will go after the and
tags, and before any Javascript.If you are creating an XML-only document, it will be the first thing in the file.
Version
You must include the version attribute for the XML declaration. The version is currently “1.0.” Defining the version lets the browser know that the document that follows is an XML document, using XML 1.0 structure and syntax.
Standalone
The next step is to declare that the document “stands alone.” The application that is processing this document knows that it doesn’t need to look for a DTD and validate the XML tags.
Encoding
Finally, declare the encoding of the document. In this case, the encoding is UTF-8, which is the default encoding for XML. You can leave off this attribute and the processor will default to UTF-8.
After the declaration, enter the tag for the root element of your document. This is the top-most element, under which all elements are grouped.Follow XML Syntax
Now, enter the rest of the your content. Remember to follow XML syntax:
- Remember that capitalization matters;
- Quote all attribute values;
- Close all tags;
- Remember to close empty tags too, like this:
Pretty easy, isn’t it? That’s all there is to it!
Valid XMLA valid document conforms to the XML syntax rules and follows the guidelines of a Document Type Definition (DTD).
The process of comparing the XML document to the DTD is called validation. This process is performed by a tool called a parser.
Begin the Valid XML DocumentTo begin a well-formed document, type the XML declaration:
If you are embedding XML, it will go after the and
tags, and before any Javascript.If you are creating an XML-only document, it will be the first thing in the file.
Version
You must include the version attribute for the XML declaration. The version is currently “1.0.” Defining the version lets the browser know that the document that follows is an XML document, using XML 1.0 structure and syntax.
Standalone
The standalone=”no” attribute tells the computer that it must look for a DTD and validate the XML tags.
Encoding
Finally, declare the encoding of the document. You can leave off this attribute and the processor will default to UTF-8.
The second element in a valid XML document is the DOCTYPE definition. This identifies the type of document and DTD in use.
If you look at HTML source files, you’ll often see a !DOCTYPE definition, especially if the file was created by a WYSIWYG tool. The DOCTYPE definition points to an HTML DTD.
In a valid XML file, !DOCTYPE tells the program that is processing your XML file two things: the name of the type of document and the name and location of the DTD against which to validate the file’s contents.
The DOCTYPE definition looks like this:
!DOCTYPE
This says that you are defining the DOCTYPE.
type-of-doc
This is the name of the type of document contained in this file. Typically, this is the same name as the DTD.
SYSTEM/PUBLIC
SYSTEM tells the processor to look for the private DTD at the following location. PUBLIC tells the processor to look for a public DTD at the following location.
“dtd-name”
The URL after SYSTEM or PUBLIC is the name of the dtd file. All DTDs end with the extension .dtd.
If you want, instead of pointing to an external DTD, you could place the DTD information within the DOCTYPE definition, making it local to your XML document. You should do this only if you want to define a few simple elements and you want them permanently attached to a particular document.
Remember the Root Element
After the declaration, enter the tag for the root element of your document. This is the top-most element, under which all elements are grouped.
Now, enter the rest of the your content. Remember to follow XML syntax:
- Remember that capitalization matters;
- Quote all attribute values;
- Close all tags;
- Remember to close empty tags too, like this:
Elements
Elements are the basic building blocks of XML (and HTML, for that matter). Each element is a piece of data, identified by a tag. The tag contains the name of the element and any of its attributes, like this:
Thadius J. Frog is now identified as an author element. This particular author element as a date of birth (dob) attribute value of 1864.
Chose Your OwnXML is an extensible markup language. This means you create a set of elements that work for your content?and that you’ll be able to use consistently within the document.
Whether you use a DTD or not, you’ll still want to sit down and write a list of the element names that you will be using in your document. XML is case-sensitive, so as you’re thinking about the element names, be sure the think about how you capitalize them also.
Select names that are both easy to rememberer and easy to type. Ideally, your tags should have some inherent meaning too. This makes them easier to use. For example, if you want to identify “last name” as an element, consider naming the element something like “last-name” or “surname.”
Be consistent in your use of names. It is easier to apply one set of general rules to 20 different tags than it is to remember eight discrete tags that follow no particular pattern. For example, if your document is a listing of classes, you could use these elements:
But you’re just asking for confusion!
There’s a mix of capitalization. There’s a mix of abbreviation and full words. In one case the phrase “name” is the first part of tag; in the other it is the second part of the tag. It isn’t logical to remember this set of names.
Wouldn’t names like this be easier to use?