XML Concepts

XML Concepts

sing XML for Web pages is just the tip of the XML story. With XML you can store information about your documents and pieces of your document. You can then use that information as criteria for displaying page?but also for validating digital signatures, sharing data across systems, processing data for other applications … and much much more. Once you have an agreed-up method of storing metadata, the possibilities are almost endless.

With XML you create the tags you need for your documents.

XML stands for eXtensible Markup Lanugage. Each of these words describes an important part of what XML is and what it does.

The first word in XML is Extensible. This is the word that gives XML much of its strength and flexibility.

In HTML, there is a specified set of tags. You memorize these tags and use them. If you want to use any other tag, you are out of luck; what you see is what you can use. Period.

In XML, you create the tags you want to use. XML extends your ability to describe a document, letting you define meaningful tags for your applications. For example, if you site contains many glossary terms, you can create a tag called for those terms. If it contains part numbers, you could use a tag. You can create as few or as many tags as your document needs.

This can be a little disconcerting at first because you can’t go to a reference guide and look up the tags to use. But it also give you great freedom and flexibility because you can define and use tags in a way that makes sense for your documents.

Extensibility means you get more options and more power?but with those capabilities comes a need for planning. To make good use of XML, you’ll want to know and understand your documents: what pieces comprise them, how those pieces relate to each other, and how to you want identify each pieces.

Remember, though, that you are extending your tags to identify elements by what they are?not by how they look. You are Nnot creating tags to identify elements as “10 point bold.” You are creating tags to identify “chapter headings” or “book titles” or “players.”

With XML you identify, or markup, elements within your document.

The second word in XML is Markup. This is the purpose of XML: to identify elements within your document.

Markup?be it XML, HTML, or your word processing program’s own markup?is essential for documents to to make sense. Without markup, the computer sees your document as one long string of text, with each character having equal importance to every other character. Without markup, your document is just one bit clump of bits.

By marking up your document, you begin to give meaning to the pieces within. You identify the bits and pieces in a way that gives them value and context: “this is a paragraph,” “this is a song title,” “:this is a section head.” And, with extensible markup you can mark up the document in ways that match your needs.

However, it is important to remember that markup is just a way of identifying information. This is an important and critical step?but by itself markup does nothing. Markup does not program the data to act in a certain way, to display in a certain way, or to do anything other than carry an identifying mark.

With XML you follow a set of rules and syntax to markup your document.

The third word in XML is Language. This states that XML follows a firm set of rules. It may let you create an extensible set of markup tags, but its structure and syntax remain firm and clearly defined.

In technology, the term “language” is often automatically appended to the word “programming” as in “programming language.” Too often, people assume that all languages are for programming and for creating a set of actions. But a languages is just a way of describing something?be it a program’s actions or a markup definition.

Extensible Markup Language is a means of marking up data, using a specific syntax.

XML applies structure to documents. Documents are sets of related information.

The term structure seems to bring some unpleasant imagery with it, especially for creative souls who want to make this medium work in new and innovative ways. But when one is dealing with publishing, the term structure is quite positive. It is the way we put a skeleton behind information, so that the pieces of information work together and make sense as a whole.

There are two key principles behind a structured model:

  1. Each part?or element?has a relationship with other elements. This series of relationships defines the structure.
  2. The meaning of the element is separate from its visual appearance.

We can’t really talk about structure without first talking a bit about documents. Document is another of those terms that conjures up somewhat negative images; one tends to picture “dusty stacks of documents” or “attorney’s documents” or “document processing.” But in this case, a document is simply a collection related information.

For example, this page is a document. Your favorite ‘zine is set of documents. Your intranet is probably comprised of hundreds if not thousands of documents.

Sometimes documents are created as a single unit. Sometimes they are built on demand, pulling pieces from a database and assembling into a document as the reader requests. In both cases, structure makes the document easier to create, maintain, and display.

Document Structure
The document structure defines the elements which make up a document, the information you want to collect about those elements, and the relationship those elements have to each other. You use XML to markup the document, following the structure you have decided upon.

By treating a document as a collection of elements, you free it from the constraints of time, place, and presentation format. You can move the structured document from a word processor to a PDA to a Web browser. The structure is intact on each; you just alter the display characteristics for each device.

The document structure is called the document tree. The main trunk of the tree is the parent. All the branches and leaves are children. Document trees are usually visually represented as a hierarchal chart.

Structure vs. Format
The most important thing to remember about a structured document is that it is defined by the elements it contains, not by how it looks.

Structure says that an element is a paragraph. Format says to display the paragraph in 12 point Times.

Structure says the element is a book title. Format says to display the book title in green bold body text.

Structure say the element is a social security number. Format says to hide and not display the social security number.

Learning to separate structure from format is critical in making good use of XML.

Metadata is data about data. A key use of XML is to collect and work with metadata.

At its most basic level, XML is a metadata language. That is, it is a way of assigning information to pieces of data. The most obvious use of this is to identify a piece of data as a certain structural element. But this is just the beginning.

XML is about much more than marking up documents for use in a Web browser. XML is really about adding layers of information to your data, so that the data can be processed, used, and transferred between applications.

Metadata in HTML
If you’ve built a Web site, you’ve almost certainly worked with metadata. The keyword and description meta tags are simple uses of metadata. With these meta tags you can assign the document as a whole information about the general type of content it contains. This information doesn’t display in a Web browser, but it does display in search engine results.

Another use of meta tags is to store information such as creator name and creation date. Some servers are structured to work with these meta tags, allowing you to sort by creation date or display based on creator name.

Going Further with XML
XML takes this basic idea much further. With XML, you can describe where you found your data, you can quantify, qualify, and further define it. You can then use this metadata to validate information, perform searches, set display constraints, or process other data.

Here’s just a few examples:

  • XML initiatives are under way which will allow for digital signature verification and validated form submission. This could make it possible for forms, with signatures, to be submitted online and be legally binding.
  • XML initiatives are under way to help catalog Web content. Using metadata, the Web can be be indexed better and search more effectively.
  • XML is being used to transfer data, based on factors just as date entered, between unlike databases. The metadata is both a means to find the correct data bits and a common language of transfer between databases which do not speak each other’s specific language.

The RDF Proposal
One W3C-blessed use of metadata which you may have heard about is a proposal called the Resource Description Framework, or RDF. RDF is an application of XML for making metadata machine-processable. It allows applications to exchange information about data automatically. This has implications in indexing, content rating, intellectual ownership, e-commerce, and privacy, among other things. The W3C says:

RDF with digital signature will be the key to building the “Web of Trust” for electronic commerce, collaboration, and other applications.

Display Issues
XML alone will not display a page. You must use a formatting technology, such as CSS or XSL to display XML-tagged documents in a Web browser.

XML is about separating structure and format. An XML document doesn’t know anything about how to display itself. It relies on other technologies for this.

Although XML does not deal with form, it contains a great deal of information about the document and its elements. This, when combined with style tools, gives you a whole new strength and flexibility in displaying your documents without having to maintain multiple copies of the document.

Extensible Stylesheet Language, XSL, is the future of XML display. It is an XML-based languages for expressing stylesheets.

With XSL, you can make context-sensitive display decisions. For example, you could automatically display the document one way in a Web browser and another on a PDA.

XSL can also transform XML into HTML, so that older browsers can view XML documents.

Cascading Style Sheets, CSS-1 and CSS-2, are the current way to display XML documents in a Web browser. CSS is a means of assigning display values to page elements.

If you are going to be working with XML and you will be concerned with displaying pages, learn CSS. The CSS Reference Guide contains a guide to the CSS-1 properties.

Behaviors are a non-standard, IE5 technique that lets you do some interesting display actions with XML tags. They combine scripting and CSS in a component file. This component can be attached to a particular tag and used in many different documents. The Behaviors Library shows some of the things you can do with this technique.

The Document Object Model lets you address, change, and manipulate any individual portion of the Web page.

The phrase “document object model” means that you treat your document as a collection of individual objects, rather than a single solid unit. The W3C DOM is the set of rules for doing this in a standard way in a Web browser, with HTML and XML files.

O Is for Object
In an object-oriented approach, the program or the document is made up of many smaller components called objects. The smaller components can be re-arranged, added to, or removed dynamically.

The idea of objects has become quite popular in both software and documents. The programming language Java and the scripting language JavaScript each has an object-oriented philosophy at its core. The adoption of the standard DOM enables Web pages to share that object approach too.

With an object model, you manage the small pieces, combining them and reusing them as it makes sense?instead of writing one huge applications program or one huge document. You might think of an object approach as being a little like a collection of Lego blocks … different pieces do different things, but you can combine and recombine them into many different finished projects.

Each object type acts a template. You can use an instance of the same object over and over again. For example, you might have multiple instances of the element in a document. All the objects share the same name, canine, and work the same way, but each one represents its own set of data and can be addressed individually.

It isn’t enough to merely know that an object is an object. You also need to know how to talk to that object and give it commands. That’s where the API comes in.

API stands for Application Programming Interface. An API is a set of rules that describes how you can access and manipulate an object. The DOM specification describes the API for HTML and XML documents.

The DOM, by providing a standard API, defines the naming conventions, programming models, and other rules for communicating with an object in an HTML or XML page.

Getting from XML to Objects
In an XML document, each element is actually an object?it has a name and it has attributes that describe it.

The browser, combined with a stylesheet, displays each of the XML elements/objects in a Web page. Because they are objects, you can address and change them individually.

Ah, but just knowing that every piece is an object isn’t enough. You need to have a set of rules, an API, to describe how to address those objects when they are placed in a Web page. That’s where the DOM comes in.

The DOM does three things?you might think of it as explaining the “who, what, and how” of the Web page.

  1. First, it describes who?which objects are a Web page and how XML objects are represented there?
  2. Second, it defines what?what can these objects do and how do they work with others?
  3. Third, it defines how?how can these objects can be addressed?

The DOM is the translator, the interface that lets all the pieces be represented properly, talk to each other, and communicate with scripts and other action tools.

It is XML that lets you add and identify data, but it is the DOM that lets the script manipulate and display that data on command in the Web browser window.

Pulling It All Together
You’ll typically be working with four technologies that combine to create an interactive Web page: XML (or HTML), a scripting language, CSS, and the DOM. This illustration shows their relationship.

component interaction
  • XML identifies data. For example: “King Lear” is a title element.
  • CSS stores information about display values for elements and delivers the information to the browser. For example: Titles are displayed in 18 point black courier type.
  • The script “talks” to the objects and sends messages to and from the browser about the objects. Typically these are “change your display” or “do this” messages based on user actions or other variables. For example: If a particular title is out of stock, display it in red.
  • The DOM provides the common interface through which various scripts and objects talk to one another and display in the Web browser.
  • The browser displays the results to the end user.

If any of these pieces are missing, you can’t create a dynamically-changing presentation of your document.


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist