RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Navigationless Database XML: Hierarchical Data Processing

Although current databases limit XML processing to linear XPath or XQuery queries, existing hierarchical database techniques enable far more complex queries using standard SQL.

ML data in standard database processing is not being used fully or correctly in business applications today. Current XML hierarchical database query processing is basically limited to single path linear hierarchical processing. This limitation could be the influence of relational join processing or might occur because the navigation of multiple hierarchical paths is too difficult and error prone to be practical with XML's manual navigation. XQuery has the same limitation; it also requires XML's manual navigation. Eliminating manual navigation from database XML processing removes these XML limitations, allowing navigationless structure processing to be unrestricted and performed automatically.

This article describes how business applications can take advantage of powerful new capabilities stemming from properly processed database XML data, using advanced hierarchical processing. It also delves into the underlying hierarchical structure principles and processing. While this level of hierarchical processing is not generally available today, it was very common three decades ago—before the advent of relational databases. However, hierarchical XML data is ubiquitous; therefore, it's time to apply the full power of such semantically rich structures.

Full hierarchical processing in databases is based on principles that have been proven to produce correct hierarchical processing data results, and can be used to replace the lack of any W3C specification or best-practices reference for performing correct hierarchical processing.

Hierarchical Data Structure: Basics
XML data resides in a hierarchical structure that contains multiple pathways connected by named nodes as in Figure 1, where the nodes have been named alphabetically—"A" through "G." These names are associated with a node type or definition. Don't confuse the node type with the node data occurrence; they're different, even though the difference in meaning is not significant most of the time. The term node is usually sufficient.

Figure 1. Database Hierarchical Node Type Structure: Each node's name denotes its type.
Hierarchical structures require navigation through various node types to access the data they contain. For example, in Figure 1, accessing node D requires first navigating to node B from node A and then to node D. Note that the hierarchical structure represented in Figure 1 can be either a physical hierarchical structure like XML or a logical relational hierarchical structure modeled in SQL. Logical and physical hierarchical structures have exactly the same hierarchical processing principles. Just like physical structures, logical hierarchical structures usually require navigation through intervening nodes—in other words, to get to logical node D, you'd have to navigate through logical nodes A and then B. This is necessary to preserve the semantics of the hierarchical structure. Accessing node B directly could introduce B data values that should have been filtered out by first going through node A. Physical structures are usually inherently protected from this type of invalid hierarchical processing operation.

Figure 1 serves to introduce a little more hierarchical data processing terminology. Nodes C and D are children or siblings of node B, their parent node. All nodes under a particular node such as B, E C, D, F, and G are descendents of node A, making node A their ancestor. Nodes A, B, and C make a path or pathway, indicated as A/B/C. Historically, in hierarchical databases, these hierarchical paths were known as legs. Sibling paths such as A/B/C and A/E/G must begin at a common originating node, node A in this case.

Hierarchical data structures have basic principles, and hierarchical data processing has operational principles. The basic hierarchical data structure principles are that a parent node can contain data without containing children, but no child node can exist without its parent node containing data. Operationally, this principle means that a given data pathway through the structure in Figure 1, such as A/B/C can terminate with node B when node C has no data—in other words, the data occurrence path terminates at the point where data stops. This is allowed because of hierarchical data preservation. In most other systems, this is not the default operation; for example, standard inner join relational processing would slice out the existing partial path data occurrences. Hierarchical data preservation allows the data occurrences along the same hierarchical pathways to vary in number of nodes reached.

Hierarchical Data Structure Processing: Basics
There are two basic types of hierarchical data processing structures: single node types and multiple node types. Both XML and hierarchical databases require multiple node types. Single node type hierarchical structures are fairly simple one-dimensional structures. They are used, for example, in organizational charts where each node represents a person. Usually, these node types can only contain a single data occurrence. Additional data occurrences are handled in single node type structures by creating another of the same node type to contain the child data.

Figure 2. Hierarchical Structure with Data: This node type structure uses multiple node types suitable for both hierarchical databases and XML.
Figure 2 shows a multiple node type data structure with its data that's based on the multiple hierarchical node type structure shown above in Figure 1, but uses multiple node types to define the more complex hierarchical structure required for databases. Each node type can also contain the multiple data occurrences required for complex hierarchical structures in XML. For example, node type B has child data occurrences of C1, C2, C3, C4, D1, D2, D3, and D4. The B1 data occurrence of node type B has C1, C2, D1, and D2 child data occurrences. If the B1 data occurrence is removed, its children data occurrences will also be removed (a cascading delete), along with their related data occurrence descendents. However, node A1's parent data occurrence is preserved regardless of whether it has child data occurrences.

Node data occurrences D1 and D2 are known as twins. Node data occurrences D3 and D4 are a different set of twins (they are related by a different parent data occurrence). There is no implied order or meaning across sibling twin data occurrences such as C1, C2 and D1, D2. This means that C1, D1 has no implied relationship over C1, D2. These twin relationships are very important in multi-path hierarchical processing. Sibling paths are independent, but their processing still needs to be coordinated.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date