Accessing and Manipulating the DOM with PHP

Accessing and Manipulating the DOM with PHP

he Document Object Model (DOM) is a W3C standard based on a set of interfaces that can be used to represent an XML (or HTML) document as a tree of objects. A DOM tree defines the logical structure of documents and controls the way you can access and manipulate those documents programmatically. Using the DOM, developers can create XML or HTML documents, navigate their structure, and add, modify, or delete elements and content. You can access the DOM from any programming language; this article uses the DOM extension for PHP 5, which is part of the PHP core, so you don’t need to install anything extra.

A DOM tree consists of nodes named according to XML conventions. For example, some familiar DOM nodes are:

  • The Document node, represented by the DOMDocument interface
  • Element nodes, represented by the DOMElement interface
  • Attribute nodes, represented by the DOMAttr interface
  • Comment nodes, represented by the DOMComment interface
  • Text nodes, represent by the DOMText interface

Extracting Elements

This section demonstrates how to extract elements and values from a DOM tree. As an example, the article uses the Book.xml document in Listing 1.

To follow along with the example, save the book.xml document into the same directory as the downloadable PHP example applications.

The first sample application uses the Book.xml document, extracts the associated tree, and displays the first occurrences of several child nodes using the getElementsByTagName method from the DOMElement interface:

DOMNodeList DOMElement::getElementsByTagName(string $name): The method returns a list of all descendant elements having the tag name specified by the $name parameter. The following example finds the root node, then finds all its child , , and elements, selecting the first one of each. Finally, it prints those nodes’ values.

load( 'Book.xml' );    //Searches for all elements with the "book" tag name  $books = $doc->getElementsByTagName( "book" );   //Searches for all elements with the "author" tag name  $authors = $doc->getElementsByTagName( "author" );  //Returns the first element found having the tag name "author"  $author = $authors->item(0)->nodeValue;    //Searches for all elements with the "publisher" tag name  $publishers = $doc->getElementsByTagName( "publisher" );  //Returns the first element found   //having the tag name "publisher"  $publisher = $publishers->item(0)->nodeValue;    //Searches for all elements with the "name" tag name  $titles = $doc->getElementsByTagName( "name" );  //Returns the first element found having the tag name "name"  $title = $titles->item(0)->nodeValue;    //Printing the found values  echo "$title - $author - $publisher 
"; ?>

The last line prints the first title, first author, first publisher found, separated with hyphens. The output is:

XML Processing I - John Smith Jr. - HisOwnTM

Browsing a DOM Tree Recursively

Because of the structure of XML, where tags either contain other tags (branches of the tree) or are leaf tags containing no child tags, you can browse an entire tree or subtree recursively by starting with any node, and following each child branch to the ending leaf nodes. The following example browses any XML subtree given a starting root node ($node), and lists the name and value of each encountered node.

function getNodesInfo($node){   if ($node->hasChildNodes())   {      $subNodes = $node->childNodes;      foreach ($subNodes as $subNode)      {         if (($subNode->nodeType != 3) ||             (($subNode->nodeType == 3) &&            (strlen(trim($subNode->wholeText))>=1)))            {            echo "Node name: ".$subNode->nodeName."
";            echo "Node value: ".$subNode->nodeValue."
";         }         getNodesInfo($subNode);               }   }      }   

The preceding example removes any empty text nodes to clean up the output using this conditional test:

if (($subNode->nodeType != 3) ||    (($subNode->nodeType == 3) &&   (strlen(trim($subNode->wholeText))>=1)))   

The preceding code checks to see whether the node being processed is either a non-text node (nodeType != 3) or a text node whose text is not empty. Alternatively, you could set the predefined preserveWhiteSpace property, which removes redundant white space, to FALSE. By default, the value is TRUE.

To test the function, this next example passes the root node of the book.xml document to the recursive getNodesInfo function, which then prints the tags and values of the entire DOM tree:

load( 'Book.xml' );  //Setting the objects tree root $root = $dom->firstChild;// Recursive function to list all nodes of a subtreefunction getNodesInfo($node){   if ($node->hasChildNodes())   {      $subNodes = $node->childNodes;      foreach ($subNodes as $subNode)      {         if (($subNode->nodeType != 3) ||             (($subNode->nodeType == 3)              &&(strlen(trim($subNode->wholeText))>=1)))            {         echo "Node name: ".$subNode->nodeName."
";         echo "Node value: ".$subNode->nodeValue."
";      }      getNodesInfo($subNode);               }   }      }   //The getNodesInfo function callgetNodesInfo($root);?>

Here are the prototypes for the methods:

  • DOMElement createElement(string $name [, string $value ]): This method creates an instance of the DOMElement class. The $name argument represents the tag name for the new element, and the $value argument represents the value of the element. You can also set the value later, using the DOMElement->nodeValue property.
  • DOMText createTextNode(string $content): This method creates an instance of the DOMText class. The $content argument represents the text content for the new text node.
  • DOMNode DOMNode::appendChild(DOMNode $newnode): This function appends the $newnode argument at the end of an existing list of child nodes, or creates a new child node list containing the specified node.
  • DOMNode DOMNode::insertBefore(DOMNode $newnode [,DOMNode $refnode]): This method inserts the $newnode argument before the reference node specified by $refnode. If $refnode is missing, the new node is prepended to the node’s list of child nodes.

The following example creates a node and appends it to the end of the tree:

//Create a new element $newElement = $dom->createElement('bibliography','Martin Didier, Professional XML');// Add it to the root using the appendChild method//The appendChild function callappendNewChild($root,$newElement);//This function appends a new child nodefunction appendNewChild($currentNode, $node){   $currentNode->appendChild($node);}
Figure 2. The Appended Node: The figure shows the new node and its contents at the end of the document.

If you run the results through the getNodeInfo() method, you’ll see output similar to Figure 2.

This next example adds a new child before the node.

//create a new  element$newElement = $dom->createElement('foreword',   'What I love about this book is that it '.   'grew out of just such a process, '.   'and shows it on every page.');//Set the reference node$allContents = $dom->getElementsByTagName('publisher');$contents = $allContents->item(0);//Call the insertNewChild function insertNewChild($contents,$newElement);//This function inserts a new child //as the first child of $currentNode function insertNewChild($currentNode, $node){   $currentNode->insertBefore(      $node, $currentNode->firstChild);   }

Running the modified document through getNodesInfo shows the new node (see Figure 3).

Figure 3. Inserting Nodes: This output shows the new child node inserted before the node.

Cloning a Node

Cloning a node means creating a new node of the same type and (optionally) with the same content of a current node. You can clone nodes using the cloneNode method:

DOMNode DOMNode::cloneNode([ bool $deep]): Creates a clone of the current node; the $deep argument specifies whether to also copy descendants of the current node. The default value is FALSE. For example, the following code clones the element and appends it as a child of the original element. Figure 4 shows the output:

//Set the reference node$author = $root->getElementsByTagName('author')->item(0);//Call the cloningNode function cloningNode($author); //This function clone the $currentNodefunction cloningNode($currentNode)   {         $clonenode = $currentNode -> cloneNode(true);      $newnode = $currentNode->appendChild($clonenode);      }
Figure 4. Cloning Nodes: Cloning the child node and appending it to the original node results in this output. The doubled text value of the original node occurs because retrieving the text value of a node retrieves its child node text values as well.

Removing Child Nodes

To remove a node from the DOM tree use the removeChild method:

DOMNode DOMNode::removeChild(DOMNode $oldnode): This function removes a child node. The $oldnode argument specifies which child node to remove. As an example, the following code removes the child from the books.xml document. You can see from the results in Figure 5 that the bibliography node is missing:

//Get a reference to the bibliography node$bibliography = $root->getElementsByTagName(   'bibliography')->item(0);//Call the removingChild function removingChild($bibliography);//This function remove the $currentNode nodefunction removingChild($currentNode){   $oldbibliography = $root->removeChild($currentNode);   }   
Figure 5. Removing Nodes: After removing the last child node (, inserted earlier with the appendChild method), listing the node names and values shows that the node is indeed gone.

Replacing Nodes

To replace an existing node with a new node, use the replaceChild method:

DOMNode DOMNode::replaceChild(DOMNode $newnode, DOMNode $oldnode): This function replaces $oldnode with $newnode child if the new node is not already a child of a different parent.

For example, suppose you want to replace the ISBN child node with a new code child node:

//Get the ISBN node$element = $dom->getElementsByTagName('ISBN')->item(0); //Create the new  element $code = $dom->createElement('code', '909090');//Call the replacingNode function replacingNode($code,$element);//This function replaces $currentNode with $node function replacingNode($currentNode, $node){   $node->parentNode->replaceChild($currentNode, $node);   }   

The output in Figure 6 shows that the node was replaced.

Figure 6. Replacing Nodes: Here's the relevant portion of the document after replacing the node with the new node.

Importing Nodes

Use the importNode method to copy a node from another tree to the current tree:

DOMNode DOMDocument::importNode(DOMNode $importedNode [,bool $deep]): This method imports a node from another XML document and inserts it into the current document's DOM tree. The $importedNode argument specifies the node to import. The imported node represents a copy of the original node, so the import does not alter the external tree. The $deep argument controls whether the method imports a deep copy of the imported node. When TRUE, the method imports the entire node subtree; when FALSE, it imports only the node.

As an example, this next application imports the node from the Book_continue.xml file into Book.xml. First, here's the Book_continue.xml document contents:

     XPath   XPath is language for...            

And here's the code to import the node:

load("Book_continue.xml");// The node we want to import to a new document$node = $olddoc->getElementsByTagName("continue")->item(0);$newdoc = new DOMDocument;$newdoc->formatOutput = true;$newdoc->load("Book.xml");// Import the node, and all its children, to the document$node = $newdoc->importNode($node, true);// And then append it to the root node$newdoc->documentElement->appendChild($node);echo "
The 'new document' after copying the nodes into it:
";$root = $newdoc->firstChild;function getNodesInfo($node){   if ($node->hasChildNodes())   {      $subNodes = $node->childNodes;      foreach ($subNodes as $subNode)      {         if (($subNode->nodeType != 3) ||             (($subNode->nodeType ==3) &&            (strlen(trim($subNode->wholeText))>=1)))            {         echo "Node name: ".$subNode->nodeName."
";         echo "Node value: ".$subNode->nodeValue."
";      }      getNodesInfo($subNode);               }   }      }getNodesInfo($root);   ?>

bool DOMNode::isSameNode(DOMNode $node): This function returns a Boolean TRUE when the nodes are equal, and FALSE otherwise. The $node argument represents the node to which you want to compare the current node.

Note that the comparison is not based on the content of the nodes.

//Checking if two nodes are equals$author1 = $root->getElementsByTagName('autor')->item(0);$author2 = $root->getElementsByTagName('autor')->item(1);//The verifyNodes function callverifyNodes($author1,$author2);function verifyNodes($currentNode, $node){   if (($currentNode->isSameNode($node))==true)   {      echo "These two nodes are the same";   }   }

Creating a New Tree

You don't have to start with an existing tree; the DOM extension for PHP 5 lets you build trees from scratch. The following example creates a completely new XML document. It also uses two new functions that let you create a comment and CDATA nodes:

  • DOMComment DOMDocument::createComment(string $data): Create a new comment node. The $data argument represents the node content.
  • DOMCDATASection DOMDocument::createCDATASection(string $data): Create a new CDATA node. The $data argument represents the node content.

The example in Listing 2 creates an object tree and saves it as Flowers.xml.

The new Flower.xml document looks like this:

      Parrot    Lily flowering      Sword Lily    Starface  ]]>

This brief introduction to the DOM extension for PHP 5 should give you enough background to manipulate existing XML (or HTML) documents, or to create them from scratch.


Share the Post: