Hierarchical Database Processing: External Operations
Filtering operations affect the entire hierarchical structure in a hierarchical manner. Filtering multi-path hierarchical queries is a much more complex and powerful operation than filtering single path queries. Standard linear hierarchical queries are limited because they filter data occurrences only upwardly and downwardly. In Figure 3
, filtering directly on node E can affect F and G descendent nodes going down the structure and the A ancestor node going up the structure. In this multi-path hierarchical structure, there are other sibling paths connected by node A (A/B/C
). When a data occurrence of node A is filtered out, the basic natural hierarchical data preservation rules of a hierarchical structure cause the underlying descendent data occurrences of nodes B, C, and D to be filtered out also. This highlights an important and powerful fact: Every node type in the hierarchical structure is related to every other node type in the structure, which gives full hierarchical processing its unlimited processing power.
|Figure 3. Hierarchical Data Filtering Flow: In a hierarchical database, filtering on node E can affect both descendent and ancestor nodes.|
The following examples use SQL to demonstrate performing hierarchical processing on hierarchical structures. SQL's relational structures map naturally to hierarchical structures. Relational tables hierarchically linked or joined together form a natural logical hierarchical structure: tables represent the node types and rows represent the data occurrences. SQL's generic SELECT
, and WHERE
commands operate on hierarchical structures naturally. The SELECT
operation selects XML and relational data items from a hierarchical structure; the FROM
operation models and specifies the modeled hierarchical structure view in SQL; and the WHERE
operation specifies hierarchical data filtering.
ANSI SQL supports full hierarchically processing inherently and transparently by using the hierarchically oriented LEFT OUTER JOIN
operation, which preserves data on the left side and not the right side. The code below defines the SQL multi-path hierarchical structure view from Figure 1
and is used in all the SQL examples:
CREATE View StructureView AS
SELECT A FROM
LEFT JOIN B ON A.PKey=B.FKey
LEFT JOIN C ON B.PKey=C.FKey
LEFT JOIN D ON B.PKey=D.FKey
LEFT JOIN E ON A.PKey=E.FKey
LEFT JOIN F ON E.PKey=F.FKey
LEFT JOIN G ON E.PKey=G.FKey
SQL also supports the required variable length pathway data occurrences required for hierarchical processing. Trailing partial missing pathways are padded with SQL null values. The SELECT
operation transfers selected nodes to the output structure, which preserves the basic structure but slices out the unselected nodes. All these hierarchical operations are reflected naturally in the relational rowset result—demonstrating that the SQL-to-hierarchical mapping is seamless, correct, and intuitive.
Hierarchical data processing operations process all requested pathways even if they have been terminated prematurely because of the hierarchical data preservation operation described previously. A data-filtering operation on a node data occurrence removes the node data occurrence and all its attached descendent node data occurrences. In many ways, it is easier to consider the WHERE
clause as a data qualifier because WHERE E.e='E1'
as used above in Figure 3
, is actually qualifying the E1 data occurrence and filtering out all other unrelated data occurrence values.
|Author's Note: This article shows the output of the SQL examples as hierarchical structure results rather than structured XML to make it easier to visualize the hierarchical operations taking effect.
|Figure 4. Sample Node Structure: This structure results from SQL WHERE filtering using the query "SELECT ALL FROM StructureView WHERE D.d='D3' AND G.g='G2'."|
shows the output of selected nodes using a WHERE
clause from the StructureView
structure shown in Figure 2
, applying the following SQL query:
SELECT ALL FROM StructureView WHERE D.d=’D3’ AND G.g=’G2’
Compare the result in Figure 4
with the full data view from Figure 2
. The query outputs both sides of the structure because it uses the SELECT
ALL command. Similarly, both sides were filtered because of the WHERE
condition. Node data occurrences B1 and E2, and their descendent occurrences were not output because neither of them met the WHERE
clause conditions. Nodes D3 and G2 were qualified directly in the WHERE
clause so their related data occurrences C3, C4, and F1, F2 qualified for inclusion as the hierarchical data filtering flow in Figure 3
above dictates. D4 and G1 were not qualified because their data occurrences could never be qualified together with the AND condition. The result is semantically correct.
Here's a different query that excludes specific nodes from the StructureView
structure in Figure 2
SELECT A.a, C.c, D.d, G.g FROM StructureView WHERE F.f='F2'
The output (see Figure 5
) excludes all fields in nodes B and E because they aren't in the SELECT
list. It slices those node types from the output structure while preserving their lower-level node types and any of their node data descendents (C, D, and G).
|Figure 5. Excluded Node Example: Excluding all fields in nodes B and C with the query "SELECT A.a, C.c, D.d, G.g FROM StructureView WHERE F.f='F2'" causes node promotion and node collection.|
In the output, the preserved node data descendents are attached to the excluded node type's parent node type (see Figure 5
), in a process called node promotion
. Notice in this example that node A gains an additional sibling pathway. This operation is known as node collection.
These hierarchical operations are standard for hierarchical processing and also occur naturally in hierarchical SQL queries.