he Document Object Model (DOM) is a W3C standard based on a set of interfaces that can be used to represent an XML (or HTML) document as a tree of objects. A DOM tree defines the logical structure of documents and controls the way you can access and manipulate those documents programmatically. Using the DOM, developers can create XML or HTML documents, navigate their structure, and add, modify, or delete elements and content. You can access the DOM from any programming language; this article uses the DOM extension for PHP 5, which is part of the PHP core, so you don't need to install anything extra.
A DOM tree consists of nodes named according to XML conventions. For example, some familiar DOM nodes are:
- The Document node, represented by the DOMDocument interface
- Element nodes, represented by the DOMElement interface
- Attribute nodes, represented by the DOMAttr interface
- Comment nodes, represented by the DOMComment interface
- Text nodes, represent by the DOMText interface
This section demonstrates how to extract elements and values from a DOM tree. As an example, the article uses the Book.xml document in Listing 1.
To follow along with the example, save the book.xml document into the same directory as the downloadable PHP example applications.
The first sample application uses the Book.xml document, extracts the associated tree, and displays the first occurrences of several child nodes using the getElementsByTagName method from the DOMElement interface:
DOMNodeList DOMElement::getElementsByTagName(string $name): The method returns a list of all descendant elements having the tag name specified by the $name parameter. The following example finds the <book> root node, then finds all its child <author>, <publisher>, and <name> elements, selecting the first one of each. Finally, it prints those nodes' values.
// Create a document instance
$doc = new DOMDocument();
//Load the Book.xml file
$doc->load( 'Book.xml' );
//Searches for all elements with the "book" tag name
$books = $doc->getElementsByTagName( "book" );
//Searches for all elements with the "author" tag name
$authors = $doc->getElementsByTagName( "author" );
//Returns the first element found having the tag name "author"
$author = $authors->item(0)->nodeValue;
//Searches for all elements with the "publisher" tag name
$publishers = $doc->getElementsByTagName( "publisher" );
//Returns the first element found
//having the tag name "publisher"
$publisher = $publishers->item(0)->nodeValue;
//Searches for all elements with the "name" tag name
$titles = $doc->getElementsByTagName( "name" );
//Returns the first element found having the tag name "name"
$title = $titles->item(0)->nodeValue;
//Printing the found values
echo "$title - $author - $publisher \n";
The last line prints the first title, first author, first publisher found, separated with hyphens. The output is:
XML Processing I - John Smith Jr. - HisOwnTM