alid XML documents follow a set of rules defined in a associated DTD. This Document Type Definition defines elements, attributes, and relationships between elements.
DTDs are saved in an ascii text file with the extension .dtd, like this:
When your XML document is processed, it is compared to its associated DTD to be sure it is structured correctly and all tags are used in the proper manner. This comparison process is called validation and is is performed by a tool called a parser.
Remember, you don’t need to have a DTD to create an XML document; you only need a DTD for a valid XML document.
Before You Begin
There are a handful of terms you’ll be hearing as you work with an XML DTD. Take a couple of minutes to become familiar with them before you begin. Click on any of the terms to see its definition.
|Document Tree||Root Element|
|Parent Element||Child Element|
A DTD is a way to ensure that an XML document uses elements correctly. It contains a set of rules. When your XML document is processed, it is compared to its associated DTD to be sure it is structured correctly and all tags are used in the proper manner.
- Always contains rules that define elements.
- Always contains rules that define the relationship between elements.
- May contain rules that define attributes for elements, althought not all elements have attributes.
- May contain rules that define entities.
- May may contain rules that define notations.
Finding a DTD
Using a DTD doesn’t necessarily mean you have to create one from scratch. There are a number of existing DTDs, with more being added everyday.
As XML becomes wide-spread, your industry association or company is likely to have one or more published DTDs that you can use and link to. These DTDs define tags for elements that are commonly used in your applications. You don’t need to recreate these DTDs — you just point to them in your doctype tag in your XML file, and follow their rules when you create your XML document.
Some of these DTDs may be public DTDs, like the HTML DTD. Others may belong to your company. If you are interested in using a DTD, ask around and see if there is a good match that already exists.
Create Your Own External DTD
Another option is to create your own DTD. The DTD can be very simple and basic or it can be large and complex. The DTD will be a reflection of the needs of your document.
It is perfectly acceptable to have a DTD with just four or five basic elements if that is what your document needs. Don’t feel that creating a DTD necessarily needs to be a huge undertaking.
However, if your documents are complex, do plan on setting aside time — several days or several weeks — to understand the document and the document elements and create a solid DTD that will really work for you over time. Remember, you’ll be able to use this DTD with many individual documents, so it is worth the time to think it through and craft it well.
Create Your Own Internal DTDs
You can insert DTD data within your DOCTYPE definition in an individual XML document. If you’re worked with CSS styles, you can think of this as being a little like putting style data into your file header.
DTDs inserted this way are used in that specific XML document only. This might be the approach to take if you want to validate the use of a small number of tags in a single document or to make elements that will be used only for one document.
You can insert DTD data within your doctype declaration. This type of DTD is used only by the one specific XML document that contains it.
This is a very simple example of DTD data within the doctype declaration. Click on any line of the code to learn what it does.
DTDs are stored as ascii text files with the extenstion .dtd. Each file begins with a DOCTYPE definition and includes a seres of element definitions, attribute lists, entity defintions and notation definitions. Here’s an example; this might be the DTD for a set of documents about books. Click on any line for more information about it:
DTDs can be much more complex than this example?and they typically are?but this gives you a sense of what they can do. It’s just a matter of structuring your data and figuring out the “parts” of your content.
Reading a DTD
Even if you don’t plan to build a DTD from scratch, it is helpful to know how to read one and to understand the document it is describing.
From reading a DTD you should be able to compile a list of elements and their attribute, and how and when to use them. You should also be able to compile a list of entities that you can use within the document.
Some people find it helpful to actually sketch out a document tree as they go through the DTD, to visualize the structure of the document.
Here’s a list of things to look for as you go through a DTD:
Read the Comments
Read the comments! Comments can tell you a lot about the DTD, how to use it, and what to be aware of when using it.
Most DTD authors will include information that you should know before using the DTD. This might range from use restrictions to how-to information.
Comments look like this:
Note the Basic Elements
Look through the DTD and identify the element names that comprise the document. Note how they are capitalized. You might want to develop a reference sheet of elements, that you can make notes on as you work your way through the DTD.
Elements begin like this:
The text immediately after the element declaration is the element’s name.
Read the Element Declaration
Each element declaration provides the name of the element and the content which it contains. Sometimes the content is text. Other times is other elements, arranged in a certain order or used a certain number of times.
Click on each portion of these element declarations to learn about the rules they describe.
Look for Parent/Child Relationships
The element rules build a hierarchy of element, describing how one element is related to another. And element that is contained within another is considered a child of the element in which it is contained. Use these relationships to sketch out your document tree.
The parent/child relationship is defined in the content type portion of the element definition. If the content type is another element, then those elements are children of the element whose definition you are reading. For example: FIRST, MI, and LAST are children of EMPLOYEE:
EMPLOYEE (FIRST, MI, LAST)>
Read Attribute Lists
After element definitions, you may see attachment lists. An attachment list begins like this:
Each attribute list defines the attributes for an element. Many attributes may be defined in one ATTLIST.
The ATTLIST is structure like this:
See Which Element the Attribute Defines
Right after the ATTLIST declaration is the name of an element. This is the element that the attribute list defines. For example, this ATTLIST defines the COMMENT element:
Find Attribute Names for Each Element
Following the element name is the name of the first attribute declared in this list. This name is the attribute name you type into the element tag in the XML file. For example, this ATTLIST defines the attribute “category” for the element COMMENT.
Add the attribute information to the element reference list you are building.
Determine Attribute Value Types
Attributes can be one of several different types. The attribute-type describes the type of value that the attribute may contain. For example, this ATTLIST says that the “category” attribute for the element COMMENT contains one of four values: red, green, blue, or other.
See the Attribute’s Default
The final part of the ATTLIST is the default value of the attribute. The default value has a strong effect on how the attribute is used and what values it might have if you don’t use it in the XML tag. You can make the value required (#REQUIRED) or optional (#IMPLIED). Or, you can provide a default value that will be used automatically if the attribute is not entered.
Read Entity Declarations
Along with element and attribute definitions, you may also see entity definitions. Typically, these will appear in a group, often at the beginning of the DTD, and usually with explanatory comments.
An entity definition begins like this:
After the declaration, is the entity’s name and the contents of the entity. The contents may be text or it may be a pointer to another external file. For example, this defines two entities, one called “copyright” and one called “trademark.” Copyright is defined within the definition, while trademark points to another file.
Elements are the basic building blocks of XML. You define elements in a DTD; you use them in a document. A basic element definition looks like this:
Each element begins with an element declaration,
After the declaration is the element’s name. The way the name appear in the element definition is exactly the way it must be used in the XML document. Capitalization counts!
After the name comes a rule that describes what the element can contain. Through this description, the element take on hierarchal relationships with each other.
Although the basic bits of the rules are simple, they can be grouped and combined to create quite complex definitions.
This table summarizes the element rule definitions.
Elements can contain text, other elements, a combination of text and other elements, or they may be empty.
Text. Elements can contain textual data.
Other Elements. Elements can contain only other specified elements and no text. The contained element are called children of the containing element. The containing element is the parent of the child elements.
Combination. Element can contain a mix of textual data and other specific elements.
Empty. Empty elements get their value from their attributes. An empty element will typically have at least one attribute. In HTML, the IMG tag is a good example of an empty element. It gets its value from the src attribute.
Number of Occurences
You can specify the number of times a child element is used within its parent.
Once and only once. The element listed by itself indicates that it can be used once and only once:
Used in document
At least once, or many times. The element followed by a plus sign indicates that this element can be used many times with the parent:
Used in document
Once or not at all. The element followed by a question mark indicates that this element can be used either one time or not at all:
Used in document
Once, not at all, or a many times as you want. The element followed by an asterisk indicates that this element can be used as many time as needed.
Used in document
You can specify the order in which child elements appear.
Specific order. Child elements can be defined to be used in a specific order. The comma (,) separates elements that are listed in a specific order. For example, you could set a rule that creates an EVENTLIST. In the list, you must always use the EVENT element, followed by the SPONSOR element.
Used in document
Either Or. You can define child elements so that one or another can be used. The bar (|) separates either or choices.
Used in document
Groups can be used to create complex rules, that combine elements and different usage option.
For example, when groups are combined with a “use many times” symbol, you can create a rule that allows multiple uses of elements — either in in any order or as repeated sets. For example, here the element EVENTLIST can contain multiple sets of EVENT and SPONSOR groups:
Used in document
Here, the EVENTLIST can contain either the EVENT element or the SPONSOR element, but this either or group can be used many times.
Used in document
Hints for Element Names
- Select names that are both easy to remember and easy to type.
- Give your tags should have some inherent meaning. For example, if you want to identify “last name” as an element, consider naming the element something like “last-name” or “surname.”
- Use names that are consistent with current processes. If people call “social security number” SSN, create an element called SSN. Don’t create an unfamiliar “socsecnum” element.
- Be consistent in your use of names. It is easier to apply one set of general rules to 20 different tags than it is to remember eight discrete tags that follow no particular pattern.
Elements can have attributes, which describe the element in more detail. When you create an element in your DTD, you can also an create an attribute list for the element.
Attribute lists define the name, data type, and default value (if any) of each attribute associated with an element.
In this very simple example, we’re adding some attributes to the title element from our book list. We want to be able to specify the edition date and whether the book is paperback or hardcover. Click on any of the attribute list code to see what it does.
Here’s how you’d use these attributes in an XML file. Notice the use of the edition attributes in each title tag. Notice how one title tag also uses the type attribute to indicate that this book is a hardcover title.
Attributes can have one of seven different types of data, but the two most common are:
CDATA. Character data. This allows the attribute value to be textual data. You use it like this:
Pre-defined values. You can list a string of specific values that the attribute can have. The value set is enclosed in parenthesis and each value is separated with a vertical bar, like this:
You can specify a default value for the attribute, or make the attribute required or optional. The default value has a strong effect on how the attribute is used and what values it might have if you don’t use it in the XML tag.
#REQUIRED: the attribute must have a value every time the element is listed. You specify that an attribute is required like this:
#IMPLIED: the procesor ignores this attribute unless it used as part of the element. It does not assume any default value.
#FIXEDvalue: an attribute is not required for the element, but if it occurs, it must have the specified value. For example, if the new attribute is used, it must have the value of “yes”:
VALUE defaultvalue provides a default value for that attribute. If the attribute in not included in the element, the processing program assumes that this is the attribute’s value. For example, this gives the type attribute a default value of “hard”:
An entity is a short cut to a set of information.
When you use an entity, it “expands” to its full meaning, but you need only type the shorter entity name during data entry. You might think of an entity as being a bit like a macro — it is a set of information that can be used by calling one name.
XML defines two types of entities.
The general entity is one that you define in a DTD and use in a document. General entities are easy to spot. They are defined with the entity declaration,
The parameter entity is one that you define and use within a DTD. The content of a parameter entities may be either included in the DTD or stored in an external file. In addition, parameter entities must be parsed; they cannot be unparsed. That is, they must contain textual data that is processed rather than a GIF or other non-textual data type.
It too is defined with a entity declaration, but it is called with a percent sign, like this:
Defining a General Entity
To define an entity:
- Start the entity definition, with a less than sign, an exclamation mark, and the phrase ENTITY, all in caps:
- Type the name of the entity. Type it using the capitalization that you will use when calling it later on.
- If you are defining the entity locally, type the value of the entity, surrounded by quotes, and then close the entity definition with a greater than sign.
- If you are defining an entity in an external, ascii text file, put in a pointer to the external file, then close the entity definition with a greater than sign.
Using a General Entity
You won’t be using a general entity in a DTD. You will only be defining it here. You will be using it in an XML file, where it is called by tying an ampersand, the entity name, and a semi-colon, &entity-name;
Defining a Parameter Entity
To declare a parameter entity:
- Type the entity declaration:
- Type a space, followed by a percent sign. It is important to remember the space!
- Type another space, followed by the name of the entity:
- Type the value of the entity, surrounded by quotation marks:
- End the declaration with an end tag symbol.
One thing to notice about entities in a DTD is that when they are defined there is a space between the percent sign and the entity name–but when the entity is used there is no space between the percent sign and the entity name.
Using a Parameter Entity
It is quite simple to use a parameter entity. Simply enter the entity name, preceded by a percent sign and followed by a semi-colon, like this:
When the DTD is processed, the entity will be expanded. In this example, %info; will be replaced with a set of attribute data, which was defined in the info entity declaration.
Again, remember that when a parameter entity is defined, there is a space between the percent sign and the entity name–but when the entity is used there is no space between the percent sign and the entity name.
Parsing is the process of checking the syntax of your document and creating the “tree structure.” If you are using a validating parser, the process will also compare the XML file to its DTD.
There are a number of online parsers. To use these, you typically type in the URI of your file and tell the process to begin.
- Online validating parser, from the W3C
The W3C offers an online parser. Type the URL of the file into the form and the XML file is both parsed and validated.
- Validating Parser from Brown University Scholarly Technolgy Group
This is the most easily accessible and understandable presentation of the online parsers.
There are many parsers that you can download and run on your local machine. Most of these require you to have either a Windows or UNIX machine. They are written in a variety of langauges; this is a cross section of some of the many which are available.
- James Clark’s expat parser
James Clark is amost a brand in the SGML/XML world. His rendition of an XML parser is widely used.
- Java-based Validating XML Parser
From IBM’s AlphaWorks group, this parser claims to be 100% pure Java.
- Microsoft XML Parser in C++
A parser from Microsoft.
- XML Parser written in Python
This is a validating parser.
This parser is non-validating and checks XML syntax only.
- SiRPAC, Simple RDF Parser and Compiler
From the W3C.