Hierarchical Database Processing: Internal Operations
XML documents are hierarchically structured, composed of multiple pathways that contain a goldmine of hierarchical semantic information between the structure's pathways. Supporting multi-path queries with their additional semantics provides many advantages, and enables many advanced capabilities. Processing between multiple hierarchical pathways requires a complex coordination logic known as Lowest Common Ancestor (LCA) processing. This enables the correct semantics to be associated with multiple paths of hierarchical queries to produce meaningful hierarchical results. Here's an example:
SELECT ALL FROM StructureView WHERE F.f='F3' AND G.g='G4'
|Figure 6. Increased Data Value Example: The figure shows the structure that results from running the query "SELECT ALL FROM StructureView WHERE F.f='F3' AND G.g='G4'."|
Running the preceding query produces the structure shown in Figure 6
A query such as this requires processing across paths. When WHERE
clause conditions occur on two separate paths, the Lowest Common Ancestor node controls the coordination. Earlier, this article mentioned that sibling segments were independent. In practice, this means that the multiple data occurrences of nodes F and G are not tested together in any specific order as described earlier for twin node occurrences. Instead, they are tested in all combinations under the LCA node (the E node in this case). This limits the range combination to produce a meaningful result.
A second use of LCA logic is when the SELECT
clause references cross paths, such as when a data field on one path is selected based on data in another path. This is the case in Figure 6
, where the LCA sits at the top of all the SELECT
list items and the WHERE
clause condition—node A in Figure 6
. When the LCA data occurrence qualifies, all data occurrences under it qualify. This again follows the hierarchical data filtering flow specified in Figure 3
. SQL automatically carries out LCA processing in its relational Cartesian product processing.
XML Markup Vs XML Database Data
Coders and users navigate XML hierarchically today, but that's not always necessary for hierarchical database structures. In the past, database hierarchical structures could be queried without the user specifying any navigation, because every defined node type in the structure was unique in the structure (as we have seen so far) and had only a single path to it as in Figure 6
. Those two qualities make querying the structure unambiguous. In such cases, XML navigation can be performed internally automatically.
|Figure 7. Markup Hierarchical Node Structure: In markup-type hierarchical node structures such as XML, you can locate any specific node through navigation.|
In the markup hierarchical node structure shown in Figure 7
, the D node type occurs in multiple locations. In other words, locating a specific D node requires navigation, such as the XPath query A/B/D
. Alternatively, you could use the XPath //D
search operation to locate the closest existing data occurrence of a D node—which could be either A/B/D
. Such queries allow fuzzy operations such as finding the next closest matching keyword. Fuzzy operations are OK for markup uses, but are not OK for database operations, which need exact precise processing semantics. Imagine summing the "price" for books and inadvertently adding in the price for magazines when there is no data occurrence of "price" for a book. Database hierarchical processing should not have duplicate node types in a structure and should not have to use XPath search operations in database processing (except when it's isolated in separate search functions). User navigation is not necessary for database processing.
XML's use in databases was an afterthought; XML was designed to process markup data, not to store critical database data. Markup data and database data have very different uses and should not be processed identically. Markup data is more unpredictable and hierarchically forgiving than database data should be because the same node data types can occur in many locations in the hierarchical structure. Markup tag names are used as the node names they represent in their XML hierarchical structure to classify the text (i.e. which person said what) or to specify processing directives to be applied to the text (i.e. bold, underline). For example, in Figure 7
, the D node type occurs in multiple locations, which makes the structure ambiguous for database querying and requires navigation to locate a specific D node type. You can temporarily correct this situation for database use by renaming nodes when necessary. SQL supports this.
The ideal solution for correct database XML hierarchical processing is separating database data processing from markup data processing. As you've seen, each has its own particular logic and processing rules. Separating the two achieves correct database results and frees the database to take full advantage of hierarchical data's unique capabilities, such as navigationless hierarchical processing.