Creating Hierarchical Data Structure Mashups

NSI SQL can naturally perform full multi-path hierarchical data query processing that far exceeds the single linear path processing performed by XPath and XQuery today (see the article “Navigationless Database XML: Hierarchical Data Processing“). That article covers the basics of full hierarchical processing and describes how SQL can inherently perform this advanced processing. This article describes new ANSI SQL hierarchical capabilities that can perform semantically correct advanced hierarchical data structure joining and mashups.

While the previous article concentrated on using the SQL LEFT OUTER JOIN operation to model and process hierarchical structures—both logical relational and physical XML structures—in individual views, this article demonstrates how you can join these heterogeneous hierarchically complex structure types together dynamically and seamlessly into a larger hierarchical structure. You’ll also see how standard SQL hierarchical structure joining also naturally supports unrestricted data structure mashups that still follow standard hierarchical principles. A powerful and advanced new hierarchical data modeling capability makes this possible.

SQL Hierarchical Structure Views
When integrating relational and XML data, it is important to get the correct result without any data loss. Unfortunately this has not yet been achieved. Today’s processing introduces inaccurate results that inhibit correct hierarchical XML processing in databases. However, by modeling relational flat data as hierarchical structures, it becomes possible to integrate relational data with XML without any loss of data accuracy. Figure 1 demonstrates this by defining a hierarchically modeled XML view named “XML,” and a relationally modeled hierarchical view named “RDB.”

 
Figure 1. Hierarchical Data Modeling: The figure shows a hierarchical data model in both SQL and XML.

The difference in the XML and relational view in Figure 1 is the View keyword. In the XML view definition (XVIEW), a middleware XML processing extension receives control, accesses the native XML data, and maps XML element names to SQL table names and XML element and attribute data names to SQL column names. Because XML is a contiguous physical structure, it does not require foreign data keys to define its structure, so the SQL ON clause linkage clause in the XML View uses table names rather than column names to make the node connections as shown in the XML view in Figure 1. This is a physical XML view definition, but it’s treated as a SQL logical view after the data is retrieved, enabling seamless heterogeneous operation of relational and XML data.

In both SQL views in Figure 1, the LEFT OUTER JOIN operation preserves the left data argument even when there is no matching right argument, meaning that a parent node can exist without a child node, but a child node can not exist without a parent node. Notice in Figure 1 that you can define single paths by following a path of nodes (the RDB view) or define multiple paths by joining multiple nodes to the same parent node (the XML view).

As you can see from Figure 1, if a relational RDB hierarchical path occurrence has no existing R node then nodes D and B are never linked. On the other hand, if an R and D node exists than a partial path is possible—so variable length paths are possible in the relational view. In the XML view, if an X node does not exist then node M and L are never linked, but if X and M nodes exists than one path can exist while the other does not.

Heterogeneous Views
The SQL below hierarchically joins the two hierarchical SQL views from Figure 1 to create a unified heterogeneous logical view called “RDBXML.” SQL view JOIN operations are not performed until the view is invoked in real time. That invocation transparently performs the relational/XML data integration seamlessly and automatically. 

    CREATE VIEW RDBXML AS    SELECT * FROM  RDB    LEFT JOIN XML ON R.r=X.x

The preceding code places the relational RDB view structure hierarchically over the XML view structure and connects them using a single LEFT OUTER JOIN operation. This is possible because entire hierarchical views are being joined together, not just joined one single node at a time as in Figure 1. The technique provides incredible power. Figure 2 shows the result structure and its expanded SQL.

Joining Hierarchical Structures Hierarchically

 
Figure 2. Expanded RDBXML View and Structure: The SQL shown treats the the logical and physical structures exactly the same.

The heterogeneous view in Figure 2 automatically generates the SQL LEFT OUTER JOIN sequence that defines its combined XML and RDB structures shown in Figure 2 when invoked. This causes their stored RDB and XML views to be expanded into a single unified LEFT OUTER JOIN sequence. Notice how the lower level XML view is nested under the upper relational RDB view in its expanded SQL view expansion in Figure 2. In particular, notice the automatic placement of the last ON clause pushed to the end of the multiple LEFT OUTER JOIN view expansion. This ON clause placement causes the lower view structure to be fully defined and materialized (built) before encountering the final ON clause which joins the fully formed XML lower structure to the already materialized upper relational view.

The above delayed structure joining happens whether the lower structure is logical or physical. In the case of a lower level physical structure, this is exactly what is required to naturally process fixed physical structures which naturally operate as fully materialized structures. This allows logical and physical structures to be processed exactly the same. This is performed automatically by ANSI SQL.

Figure 2 demonstrates an expansion of the XML and RDB views into a unified logical view using the LEFT OUTER JOIN command, preserving the combined semantics of both views. Typically, with both XML and relational processing, the integration solution is to flatten the hierarchical data to resemble relational data—losing valuable hierarchical semantics. As you can see from Figure 2, this solution does the opposite.

Elevating the relational data to a hierarchical level increases accuracy and opens up many new capabilities. For example, the elevation makes the relational/XML data integration seamless and increases the semantics used in the SQL query. This in turn makes the query more powerful because it can automatically handle multi-path queries. This increases the value of the user’s data by taking advantage of additional semantic data that exists between the different pathways of the accessed structure, and greatly increases the number of different queries possible from the same multi-path structure.

Dynamic Structure Joining
Combining data structures does not need to be performed beforehand statically. You can dynamically specify the combining of structures. For example, the SQL statement in Figure 3 shows a query that uses the relational and XML views directly to dynamically combine structures. The dynamic query in Figure 3 is a good example of distributing hierarchical data modeling between stored views and users to add flexibility. The more complex relational and XML views that define the basic view structures can still be utilized flexibly. Users can easily join them in an ad hoc or interactive fashion using a single LEFT OUTER JOIN operation. You can interactively join the views in any number of ways; for example, by joining the XML structure over the relational structure instead of under it, or changing their connection points to use other data relationships which can also form other structures.

 
Figure 3. Dynamic Hierarchical Structure Join: This example shows how a user can create a single join point to combine two complex hierarchical structures.

Hierarchical Optimization

 
Figure 4. Omitting Unneeded Nodes: The query specifies a list of required nodes, so the optimized SQL can omit the LEFT JOIN on the M node, because it’s not needed to reach any of the listed nodes.

You aren’t limited to querying complete hierarchical structures. In Figure 4, the SELECT list specifies that only specific nodes should be returned; unselected nodes (such as D and M) are not output. But a simple join of the XML and RDB views would produce unnecessary nodes (such as node M). Fortunately, hierarchical structures can remove unnecessary pathways without affecting the results. This also means that they can automatically support large global views where unnecessary optimized out nodes in the view cause no overhead.

Compared to Figure 3, the query in Figure 4 removes the LEFT OUTER JOIN for the M node, because it’s not selected for output, and therefore is not necessary to the query. However, the D node is still required for the query even though it is not selected for output because you need it to navigate to the B node (which is selected for output). 

Figure 5 shows the three structure stages that hierarchical data processing can go through using the query from Figure 4. The first structure in Figure 5 defines and joins the basic input structure(s). If semantic structure optimization cannot be applied to the joined structure based on its runtime query, then this is the structure that the query processes. Otherwise, the joined structure is optimized and then processed as indicated in Figure 5. After processing, the result structure can be further modified semantically to remove nodes that were not selected for output, but were required for processing (such as the D node).

 
Figure 5. Processing Structure Phases: Optimization can change the structure.

The rightmost output structure in Figure 5 shows the D node sliced out—also removed from the internal relational processing result set, causing its child B node to be appended to D‘s parent (node R). This process is known as node promotion for hierarchical processing and projection for relational processing, which preserves the semantics of the desired output data structure. This natural mapping of the SQL query in Figure 4 to operate hierarchically makes the SQL to XML mapping operation seamless. This is possible because SQL and XML are both operating hierarchically, and the SQL internal rowset result and the external output hierarchical result shown in Figure 5 are both hierarchical. That creates a one-to-one operational mapping—which can also be used to generate XML output automatically. This optimization and automatic XML output has also been achieved in the middleware XML extension mentioned previously.

The automatic XML output processing described above is not currently supported by XML processors on the market. These processors support more of a preformatted static XML output format. But with SQL’s dynamic data modeling of hierarchical structures and its structure-aware processing, dynamic XML output has many advantages missing today in the XML industry. Beside hierarchical optimization, these include hierarchical processing accuracy and dynamic control of the output such as the node promotion shown in the Output Structure of Figure 5 which is controlled by the SQL SELECT clause used in Figure 4.

Structure-Aware Processing
A number of advanced operational SQL features and capabilities occur through the processing phases shown in Figure 5 that are not present in other XML query languages. This is possible because the expanded LEFT OUTER JOIN data modeling consists of the hierarchical data modeling instructions that define the full structure being accessed in Figure 4 enabling structure-aware processing. When separate structures are joined into a single hierarchical structure so are their separate LEFT OUTER JOIN sequences, automatically performing this complex real-time processing required for structure-aware processing.

By determining which pathways of the query are not necessary, the unnecessary pathway processing can be removed easily and precisely by eliminating the join operation for the unneeded nodes. This global optimization process is not being performed elsewhere today because XML today uses only user navigation and the processing logic used contains procedural coding instructions as with XQuery’s user-specified looping logic, making global optimization difficult. In addition, XML processors like XQuery allow non-hierarchical operations such as the inner join operation which invalidates hierarchical structures.

Hierarchical Structure Mashups
Previously, joined data structures linked only to the root of the lower-level structure, to keep the modeled structure valid hierarchically (as in Figure 3). Linking to nodes below the root of the lower-level structure has been problematic, because it’s difficult to determine the new derived structure and its semantics. For example, the query in Figure 6 joins the relational structure (RDB) to the lower XML structure by linking below its root to the L node based on the ON clause relationship. This is possible because the lower level structure has already been fully materialized when joining views. On its own, this creates an invalid hierarchical structure, because the L node is accessible from multiple paths (X and R). In other words, the semantics of the structure in Figure 7 are ambiguous as shown.

 
Figure 6. Structure Mashup and Relationships: Joining node R to node L (below the root of the structure XML) seemingly creates an ambiguous structure, because L can be reached via two paths.
 
Figure 7. Correct Data Modeling: This mashup query illustrates linking below the root.

However, by matching the results to the semantics of the result something very interesting and quite intuitive happens. The semantics of the resulting new structure always behaves as if the lower level structure’s root were linked to as shown in Figure 7. Because this is already standard processing, it requires no special changes in the hierarchical processing other than recognizing this new situation in the data modeling process and adjusting for it as indicated. This new hierarchical data modeling finding means that hierarchical structures can be combined based on relationships anywhere in the data to mashup the data structures using ANSI SQL.

This new hierarchical data modeling finding is semantically accurate. In Figure 7, the lower structure is already a valid hierarchical structure, based on its root (X), which does not change when it’s combined with another structure. This enables the upper level structure can be related to the lower structure by matching values anywhere in the lower structure.

The breakthrough with this hierarchical data modeling finding is that it always uses the root of the lower level as its link point for determining the combined structure. This keeps the lower level structure’s semantics stable. The lower level structure has been filtered to represent only the values that reflect the linking requirements in the associated ON clause, (R.r=L.l in Figure 7).

In Figure 7 the XML structure has been filtered based on a value in its lower level L node, but the connection point to the upper level is at its root node X and not its L node. This makes sense because node L is below node X; you have to access node X to locate the correct L node. So the new structure preserves correct semantics. This also means that you can reference and filter multiple lower level nodes correctly using an ON clause specifying R.r=L.l AND R.r=M.m. In that case the resulting structure would be the same, but with more complex automatic data filtering of the lower level. You can see this is extremely powerful, and it can be performed interactively.

To sum up, mashing up structures should not change their basic structure. Changes to the basic data structure should be done by transformation operations. The new data modeling semantics of linking below the root for data mashups also helps supports advanced new transformation capabilities, which I’ll discuss in a future article. For additional information on this new level of SQL/XML hierarchical integration discussed in this article and to test it online yourself, try this demo.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

More From DevX