Home » Accessing and Manipulating the DOM with PHP

Accessing and Manipulating the DOM with PHP

he Document Object Model (DOM) is a W3C standard based on a set of interfaces that can be used to represent an XML (or HTML) document as a tree of objects. A DOM tree defines the logical structure of documents and controls the way you can access and manipulate those documents programmatically. Using the DOM, developers can create XML or HTML documents, navigate their structure, and add, modify, or delete elements and content. You can access the DOM from any programming language; this article uses the DOM extension for PHP 5, which is part of the PHP core, so you don’t need to install anything extra.

A DOM tree consists of nodes named according to XML conventions. For example, some familiar DOM nodes are:

The Document node, represented by the DOMDocument interface
Element nodes, represented by the DOMElement interface
Attribute nodes, represented by the DOMAttr interface
Comment nodes, represented by the DOMComment interface
Text nodes, represent by the DOMText interface

Extracting Elements

This section demonstrates how to extract elements and values from a DOM tree. As an example, the article uses the Book.xml document in Listing 1.

To follow along with the example, save the book.xml document into the same directory as the downloadable PHP example applications.

The first sample application uses the Book.xml document, extracts the associated tree, and displays the first occurrences of several child nodes using the getElementsByTagName method from the DOMElement interface:

DOMNodeList DOMElement::getElementsByTagName(string $name): The method returns a list of all descendant elements having the tag name specified by the $name parameter. The following example finds the root node, then finds all its child , , and elements, selecting the first one of each. Finally, it prints those nodes’ values.

load( 'Book.xml' );    //Searches for all elements with the "book" tag name  $books = $doc->getElementsByTagName( "book" );   //Searches for all elements with the "author" tag name  $authors = $doc->getElementsByTagName( "author" );  //Returns the first element found having the tag name "author"  $author = $authors->item(0)->nodeValue;    //Searches for all elements with the "publisher" tag name  $publishers = $doc->getElementsByTagName( "publisher" );  //Returns the first element found   //having the tag name "publisher"  $publisher = $publishers->item(0)->nodeValue;    //Searches for all elements with the "name" tag name  $titles = $doc->getElementsByTagName( "name" );  //Returns the first element found having the tag name "name"  $title = $titles->item(0)->nodeValue;    //Printing the found values  echo "$title - $author - $publisher 
"; ?>

The last line prints the first title, first author, first publisher found, separated with hyphens. The output is:

XML Processing I - John Smith Jr. - HisOwnTM

Browsing a DOM Tree Recursively

Because of the structure of XML, where tags either contain other tags (branches of the tree) or are leaf tags containing no child tags, you can browse an entire tree or subtree recursively by starting with any node, and following each child branch to the ending leaf nodes. The following example browses any XML subtree given a starting root node ($node), and lists the name and value of each encountered node.

function getNodesInfo($node){   if ($node->hasChildNodes())   {      $subNodes = $node->childNodes;      foreach ($subNodes as $subNode)      {         if (($subNode->nodeType != 3) ||             (($subNode->nodeType == 3) &&            (strlen(trim($subNode->wholeText))>=1)))            {            echo "Node name: ".$subNode->nodeName."
";            echo "Node value: ".$subNode->nodeValue."
";         }         getNodesInfo($subNode);               }   }      }

The preceding example removes any empty text nodes to clean up the output using this conditional test:

if (($subNode->nodeType != 3) ||    (($subNode->nodeType == 3) &&   (strlen(trim($subNode->wholeText))>=1)))

The preceding code checks to see whether the node being processed is either a non-text node (nodeType != 3) or a text node whose text is not empty. Alternatively, you could set the predefined preserveWhiteSpace property, which removes redundant white space, to FALSE. By default, the value is TRUE.

To test the function, this next example passes the root node of the book.xml document to the recursive getNodesInfo function, which then prints the tags and values of the entire DOM tree:

load( 'Book.xml' );  //Setting the objects tree root $root = $dom->firstChild;// Recursive function to list all nodes of a subtreefunction getNodesInfo($node){   if ($node->hasChildNodes())   {      $subNodes = $node->childNodes;      foreach ($subNodes as $subNode)      {         if (($subNode->nodeType != 3) ||             (($subNode->nodeType == 3)              &&(strlen(trim($subNode->wholeText))>=1)))            {         echo "Node name: ".$subNode->nodeName."
";         echo "Node value: ".$subNode->nodeValue."
";      }      getNodesInfo($subNode);               }   }      }   //The getNodesInfo function callgetNodesInfo($root);?>

Here are the prototypes for the methods:

DOMElement createElement(string $name [, string $value ]): This method creates an instance of the DOMElement class. The $name argument represents the tag name for the new element, and the $value argument represents the value of the element. You can also set the value later, using the DOMElement->nodeValue property.
DOMText createTextNode(string $content): This method creates an instance of the DOMText class. The $content argument represents the text content for the new text node.
DOMNode DOMNode::appendChild(DOMNode $newnode): This function appends the $newnode argument at the end of an existing list of child nodes, or creates a new child node list containing the specified node.
DOMNode DOMNode::insertBefore(DOMNode $newnode [,DOMNode $refnode]): This method inserts the $newnode argument before the reference node specified by $refnode. If $refnode is missing, the new node is prepended to the node’s list of child nodes.

The following example creates a node and appends it to the end of the tree:

//Create a new element $newElement = $dom->createElement('bibliography','Martin Didier, Professional XML');// Add it to the root using the appendChild method//The appendChild function callappendNewChild($root,$newElement);//This function appends a new child nodefunction appendNewChild($currentNode, $node){   $currentNode->appendChild($node);}


Figure 2. The Appended Node: The figure shows the new node and its contents at the end of the document.

If you run the results through the getNodeInfo() method, you’ll see output similar to Figure 2.

This next example adds a new child before the node.

//create a new  element$newElement = $dom->createElement('foreword',   'What I love about this book is that it '.   'grew out of just such a process, '.   'and shows it on every page.');//Set the reference node$allContents = $dom->getElementsByTagName('publisher');$contents = $allContents->item(0);//Call the insertNewChild function insertNewChild($contents,$newElement);//This function inserts a new child //as the first child of $currentNode function insertNewChild($currentNode, $node){   $currentNode->insertBefore(      $node, $currentNode->firstChild);   }

Running the modified document through getNodesInfo shows the new node (see Figure 3).


Figure 3. Inserting Nodes: This output shows the new child node inserted before the node.

Cloning a Node

Cloning a node means creating a new node of the same type and (optionally) with the same content of a current node. You can clone nodes using the cloneNode method:

DOMNode DOMNode::cloneNode([ bool $deep]): Creates a clone of the current node; the $deep argument specifies whether to also copy descendants of the current node. The default value is FALSE. For example, the following code clones the element and appends it as a child of the original element. Figure 4 shows the output:

//Set the reference node$author = $root->getElementsByTagName('author')->item(0);//Call the cloningNode function cloningNode($author); //This function clone the $currentNodefunction cloningNode($currentNode)   {         $clonenode = $currentNode -> cloneNode(true);      $newnode = $currentNode->appendChild($clonenode);      }


Figure 4. Cloning Nodes: Cloning the child node and appending it to the original node results in this output. The doubled text value of the original node occurs because retrieving the text value of a node retrieves its child node text values as well.

Removing Child Nodes

To remove a node from the DOM tree use the removeChild method:

DOMNode DOMNode::removeChild(DOMNode $oldnode): This function removes a child node. The $oldnode argument specifies which child node to remove. As an example, the following code removes the child from the books.xml document. You can see from the results in Figure 5 that the bibliography node is missing:

//Get a reference to the bibliography node$bibliography = $root->getElementsByTagName(   'bibliography')->item(0);//Call the removingChild function removingChild($bibliography);//This function remove the $currentNode nodefunction removingChild($currentNode){   $oldbibliography = $root->removeChild($currentNode);   }


Figure 5. Removing Nodes: After removing the last child node (, inserted earlier with the appendChild method), listing the node names and values shows that the node is indeed gone.

Replacing Nodes

To replace an existing node with a new node, use the replaceChild method:

DOMNode DOMNode::replaceChild(DOMNode $newnode, DOMNode $oldnode): This function replaces $oldnode with $newnode child if the new node is not already a child of a different parent.

For example, suppose you want to replace the ISBN child node with a new code child node:

//Get the ISBN node$element = $dom->getElementsByTagName('ISBN')->item(0); //Create the new  element $code = $dom->createElement('code', '909090');//Call the replacingNode function replacingNode($code,$element);//This function replaces $currentNode with $node function replacingNode($currentNode, $node){   $node->parentNode->replaceChild($currentNode, $node);   }


The output in Figure 6 shows that the node was replaced.



 


Figure 6. Replacing Nodes: Here's the relevant portion of the document after replacing the  node with the new  node.


Importing Nodes 
Use the importNode method to copy a node from another tree to the current tree:
DOMNode DOMDocument::importNode(DOMNode $importedNode [,bool $deep]): This method imports a node from another XML document and inserts it into the current document's DOM tree. The $importedNode argument specifies the node to import. The imported node represents a copy of the original node, so the import does not alter the external tree. The $deep argument controls whether the method imports a deep copy of the imported node. When TRUE, the method imports the entire node subtree; when FALSE, it imports only the node.
As an example, this next application imports the  node from the Book_continue.xml file into Book.xml. First, here's the Book_continue.xml document contents:
     XPath   XPath is language for...            
And here's the code to import the  node:
load("Book_continue.xml");// The node we want to import to a new document$node = $olddoc->getElementsByTagName("continue")->item(0);$newdoc = new DOMDocument;$newdoc->formatOutput = true;$newdoc->load("Book.xml");// Import the node, and all its children, to the document$node = $newdoc->importNode($node, true);// And then append it to the root node$newdoc->documentElement->appendChild($node);echo "
The 'new document' after copying the nodes into it:
";$root = $newdoc->firstChild;function getNodesInfo($node){   if ($node->hasChildNodes())   {      $subNodes = $node->childNodes;      foreach ($subNodes as $subNode)      {         if (($subNode->nodeType != 3) ||             (($subNode->nodeType ==3) &&            (strlen(trim($subNode->wholeText))>=1)))            {         echo "Node name: ".$subNode->nodeName."
";         echo "Node value: ".$subNode->nodeValue."
";      }      getNodesInfo($subNode);               }   }      }getNodesInfo($root);   ?>

bool DOMNode::isSameNode(DOMNode $node): This function returns a Boolean TRUE when the nodes are equal, and FALSE otherwise. The $node argument represents the node to which you want to compare the current node.
Note that the comparison is not based on the content of the nodes.  
//Checking if two nodes are equals$author1 = $root->getElementsByTagName('autor')->item(0);$author2 = $root->getElementsByTagName('autor')->item(1);//The verifyNodes function callverifyNodes($author1,$author2);function verifyNodes($currentNode, $node){   if (($currentNode->isSameNode($node))==true)   {      echo "These two nodes are the same";   }   }
Creating a New Tree 
You don't have to start with an existing tree; the DOM extension for PHP 5 lets you build trees from scratch. The following example creates a completely new XML document. It also uses two new functions that let you create a comment and CDATA nodes:

DOMComment DOMDocument::createComment(string $data):  Create a new comment node. The $data argument represents the node content.
DOMCDATASection DOMDocument::createCDATASection(string $data): Create a new CDATA node. The $data argument represents the node content. 

The example in Listing 2 creates an object tree and saves it as Flowers.xml.
The new Flower.xml document looks like this:
      Parrot    Lily flowering      Sword Lily    Starface  ]]>
This brief introduction to the DOM extension for PHP 5 should give you enough background to manipulate existing XML (or HTML) documents, or to create them from scratch. 
See also  The Role of Byzantine Fault Tolerance in dVPN Networks


				
				
					
			
						
						
							
						Disclosure
		
					
		
				
				
				
				
																										
				
				
				
			About Our Editorial Process
		
				
				
				
							At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.
See our full editorial policy.
						
				
				
				
			About Our Journalist
		
				
				
				
					
			
			
									
						
							Charlie Frank						
					
				
									
						Charlie has over a decade of experience in website administration and technology management. As the site admin, he oversees all technical aspects of running a high-traffic online platform, ensuring optimal performance, security, and user experience.					
				
									
						View Author


				
				
					
				
				
			
		
				
				
			
				Harris’s VP choice may shape climate agenda			
		
				
					
			Noah Nguyen		
				
			July 26, 2024		
				
			5:45 PM		
				
				
				
				
				
			
		
				
				
			
				Salesforce and Workday announce AI partnership			
		
				
					
			Cameron Wiggins		
				
			July 26, 2024		
				
			5:18 PM		
				
				
				
				
				
			
		
				
				
			
				Pil partners with WaveBL for eBL digitization			
		
				
					
			Rashan Dixon		
				
			July 26, 2024		
				
			1:48 PM		
				
				
				
				
				
			
		
				
				
			
				Musk activates internet in Gaza hospital			
		
				
					
			Cameron Wiggins		
				
			July 26, 2024		
				
			1:44 PM		
				
				
				
				
				
			
		
				
				
			
				Experts debate AI impact on cybersecurity			
		
				
					
			Noah Nguyen		
				
			July 26, 2024		
				
			1:43 PM		
				
				
				
				
				
			
		
				
				
			
				Palantir and C3.ai: high-potential AI stocks			
		
				
					
			Rashan Dixon		
				
			July 26, 2024		
				
			11:35 AM		
				
				
				
				
				
			
		
				
				
			
				Telefónica unveils new quantum security solution			
		
				
					
			Noah Nguyen		
				
			July 26, 2024		
				
			11:34 AM		
				
				
				
				
				
			
		
				
				
			
				Musk updates Tesla Roadster production timeline			
		
				
					
			Cameron Wiggins		
				
			July 26, 2024		
				
			11:26 AM		
				
				
				
				
				
			
		
				
				
			
				Employees report AI increases their workload			
		
				
					
			Rashan Dixon		
				
			July 26, 2024		
				
			11:24 AM		
				
				
				
				
				
			
		
				
				
			
				Protect your online privacy with VPN			
		
				
					
			Cameron Wiggins		
				
			July 26, 2024		
				
			11:24 AM		
				
				
				
				
				
			
		
				
				
			
				Amd announces Ryzen AI 9 HX 375			
		
				
					
			Noah Nguyen		
				
			July 26, 2024		
				
			11:19 AM		
				
				
				
				
				
			
		
				
				
			
				US faces hurdles to meet climate goals			
		
				
					
			Cameron Wiggins		
				
			July 26, 2024		
				
			11:13 AM		
				
				
				
				
				
			
		
				
				
			
				Elon Musk’s xAI launches Memphis supercomputer			
		
				
					
			Johannah Lopez		
				
			July 26, 2024		
				
			11:08 AM		
				
				
				
				
				
			
		
				
				
			
				Switzerland mandates open-source software for government			
		
				
					
			Noah Nguyen		
				
			July 26, 2024		
				
			8:53 AM		
				
				
				
				
				
			
		
				
				
			
				Reddit blocks most search engines except Google			
		
				
					
			Cameron Wiggins		
				
			July 26, 2024		
				
			8:46 AM		
				
				
				
				
				
			
		
				
				
			
				Monday sets record for hottest day			
		
				
					
			Johannah Lopez		
				
			July 26, 2024		
				
			8:02 AM		
				
				
				
				
				
			
		
				
				
			
				IBM stock rises on strong Q2 earnings			
		
				
					
			Johannah Lopez		
				
			July 26, 2024		
				
			7:47 AM		
				
				
				
				
				
			
		
				
				
			
				Wiz declines $23 billion offer from Alphabet			
		
				
					
			Cameron Wiggins		
				
			July 26, 2024		
				
			7:23 AM		
				
				
				
				
				
			
		
				
				
			
				Military crackdown leaves 200 dead in Bangladesh			
		
				
					
			Johannah Lopez		
				
			July 26, 2024		
				
			7:19 AM		
				
				
				
				
				
			
		
				
				
			
				Elon Musk attends Netanyahu’s address to Congress			
		
				
					
			Rashan Dixon		
				
			July 26, 2024		
				
			7:16 AM		
				
				
				
				
				
			
		
				
				
			
				Ai-powered GR Supras complete tandem drift			
		
				
					
			April Isaacs		
				
			July 26, 2024		
				
			7:11 AM		
				
				
				
				
				
			
		
				
				
			
				Mega-cap tech stocks under pressure			
		
				
					
			April Isaacs		
				
			July 25, 2024		
				
			5:45 PM		
				
				
				
				
				
			
		
				
				
			
				New IBM cybersecurity certificate at community colleges			
		
				
					
			April Isaacs		
				
			July 25, 2024		
				
			5:37 PM		
				
				
				
				
				
			
		
				
				
			
				Eviden unveils Qaptiva™ quantum emulator for researchers			
		
				
					
			Johannah Lopez		
				
			July 25, 2024		
				
			5:29 PM		
				
				
				
				
				
			
		
				
				
			
				Telefónica Tech secures global BBVA cybersecurity deal			
		
				
					
			Cameron Wiggins		
				
			July 25, 2024		
				
			5:27 PM		
				
				
				
				
					
							
		
				
				
			
						
						Show More


Figure 6. Replacing Nodes: Here's the relevant portion of the document after replacing the node with the new `node.`