his is the third in a series of DevX articles on the new hierarchical XML processing capabilities possible with navigationless database hierarchical processing. The first article covered the basics of processing full multipath hierarchical data queries, while the second article discussed combining hierarchical structures into larger hierarchical structures, and demonstrated a new, powerful, and unlimited way to mashup hierarchical structures. This article delves further into the topic, demonstrating advanced new multipath (nonlinear) hierarchical structure transformations. Like the previous articles, the processing follows correct hierarchical principles to derive correct hierarchical results.
Today, the terms “restructuring” and “reshaping” are used interchangeably for XML structure transformation processes. However, these two basic types of XML hierarchical structure transformations need to be separately distinguished, because they are different in meaning, results, and use.
- Restructuring is controlled by existing relationships in the data, while reshaping is controlled by the semantics of the current data structure.
- Restructuring is performed by using new and unused relationships to restructure the data while reshaping uses the semantics of the current structure to mold the structure into some other shape.
- Restructuring (using data relationships) can create a new structure and data with new semantics, while reshaping (using structure semantics) alters the structure without changing the data and its semantics.
Restructuring and reshaping both have their uses. You usually use restructuring to match a structure to its use in an application, and you usually use reshaping to map a structure to some desired format.
This article shows restructuring examples in SQL, which can perform the required operations without navigation or looping constructs. Specifying these SQL statements does require some thought—as you might expect for transformations. The results are correct because all operations are performed hierarchically by strictly following the hierarchical relationships in the SQL specified data structure. SQL’s hierarchical structure operation helps considerably in specifying the transformations, and simplifies designing complex transformations without introducing errors. The techniques and principles used here for restructuring are equally valid for navigational and procedural transformations used in other XML processors, but the full nonlinear transformations shown in this article may be too difficult to perform procedurally.
The SQL statements shown with the restructuring examples in this article can be executed using an ANSI SQL Transparent XML Hierarchical Processor prototype. The SQL examples can be modified. You can find the hierarchical processor prototype and directions for its use at http://www.adatinc.com/prototype.html.
Restructuring Using Data Relationships
|Figure 1. The StoreView View: The figure shows the deconstruction of the StoreView view into the two Cust and Emp sub-view fragments.
The example in Figure 1 uses a StoreView view, which joins the circled Cust and Emp sub-views to create a complex hierarchical structure that can be either a logical relational or physical XML structure, or heterogeneous combination. The example transforms this contiguous structure by isolating and manipulating structure fragments. Note that logical and physical structures appear the same in SQL working sets. A fragment is a connected subset of nodes from a hierarchical structure (which can contain multiple pathways) located either from or below the root. SQL handles this naturally and automatically. In other words, data field selection using the SELECT list non-procedurally specifies which nodes are to be selected for output from the query; all other required data fields are automatically referenced as needed. The SV1 fragment is hierarchically LEFT JOINED over the SV2 segment using this SQL:
SELECT SV1.EmpID, SV1.DpndID, SV2.CustID, SV2.InvID, SV2.AddrID FROM StoreView SV1 LEFT JOIN StoreView SV2 ON SV1.EmpCustID=SV2.CustID
Node promotion, a standard hierarchical operation, causes the structure fragments to become contiguous, letting you manipulate and join them easily into the structure being constructed using existing relationship values—all at a high hierarchical conceptual level new to SQL/XML processing. SQL’s relational processing performs the node promotion automatically because that is the way data selection (relational projection) also works.
|Figure 2. StoreView Decomposition and Transformation: The figure shows the process of selecting specific items from two fragments (identified as SV1 and SV2) to create a transformed result structure.
Figure 1 also shows the high hierarchical conceptual level of SQL processing. The separate fragments SV1 and SV2 (dotted circles) are translated into following SQL using the correlated names “SV1” and “SV2” as prefixes. Note how the prefixes are defined on the JOIN statement and used in the SELECT list; such usage is common in the other examples too. This uses SQL renaming/aliasing at the table and sub-view level, allowing fragments in the same structure to be isolated and referenced separately by their identifying assigned prefix. Figure 2 shows the selection process and the resulting structure.
Note that Figure 2 uses a number of different symbols used to define the hierarchical processing. These visual conventions are used throughout this article, specifically:
- A solid box indicates a selected node.
- A dashed box indicates an unselected node that is sliced out of the query returned result.
- A solid line connects nodes in the active structure.
- A dashed line connects nodes not in the active structure.
- A dashed arrow indicates the relationship join data points. If not specified, the solid arrow specifies the relationship data points controlling the join in addition to its other uses.
- A solid arrow represents the data modeling structure node linkage between the structures being joined. This is used to complete the unified data structure of all the structures joined, and represents the structure semantics that control query processing.
All the figures in this article use these conventions. The solid arrow that represents the data modeling structure linkages between structures being joined also usually specifies the ON clause data relationship linkage points. It is possible (for reasons explained later) that the ON clause data relationship linkage points may differ from those indicated by the data modeling solid arrow, in which case a dashed arrow indicates the data relationships linkage points.
An ANSI SQL Transparent Hierarchical XML Processor prototype can fetch the restructuring example SQL statements in this article, and execute them in real time. You can download this hierarchical processor prototype along with directions for using it. Feel free to modify the SQL examples to test their effects on the processing.
Hierarchical structure transformation as described in this section is known as “structure restructuring,” or simply as “restructuring.” You restructure physical or logical hierarchical structures by carving out separate fragments and then reassembling them differently. Pulling out separate fragments involves logically duplicating the structure by assigning different names or prefixes so the structures can be identified and accessed separately. This lets you retrieve multiple fragments that can be manipulated independently, thereby transforming the original structure. Note how simple and intuitive it is to specify the SQL query that creates the structure in Figure 2, and the hierarchical power it demonstrates.
Decomposition and transformation of logical and physical data structures is possible with standard SQL using a combination of fragment processing and SQL alias processing, letting a query access the structure view independently multiple times. Figure 2 uses the StoreView view as the single source for two separate fragments (the dotted circles in Figure 1). These reflect the data that comprises the CustView and EmpView. Essentially, these fragments decompose the StoreView and then remodel it, reshaping the data model. The reshaping operates by hierarchically re-joining the structure fragments (usually differently) based on other data value relationships in the structure.
While logical structures are free to be modeled in many different ways, physical input structures must be modeled to reflect the external structure being accessed. The StoreView in Figure 2 simply models a hierarchical structure—it could be either a physical XML or logical relational structure for this example—both are represented identically in the rowset. The example demonstrates that you can also rearrange a physical structure by separately selecting fixed fragments from the rowset containing it, and then independently join them into the main structure (unified view) with the flexibility offered by logical structures. Using the alias feature, you can give this new table or view a name and then reference that name in the SELECT clause as a high level prefix. Using the prefix enables you to target the exact object you want to reference, because the renamed objects may have the same name in both objects.
Using Alias and Structure Restructuring in a View
|Figure 3. Static Restructuring: This is the structure you get by selecting all fields from the Transform view.
The example in Figure 3 demonstrates a number of SQL capabilities applied to XML transformation processing. First, you could place its restructuring SQL query (shown earlier) in an SQL view for easy invocation and to demonstrate its flexible use in abstractions. Second, the column names InvID and AddrID can be aliased to create names reflected in generated XML. Here’s the previous example placed in a view that includes the new alias names:
CREATE View Transform AS SELECT SV1.EmpID EmpID, SV1.DpndID DpndID, SV2.CustID CustID, SV2.InvID Invoice, SV2.AddrID Address FROM StoreView SV1 LEFT JOIN StoreView SV2 ON SV1.EmpCustID=SV2.CustID
The preceding SQL assigns the aliases Invoice and Address to the InvID and AddrID columns. These new names would be reflected in XML generated from the resulting structure.
At this point, the transformation has been stored in an SQL view named Transform that you can invoke easily and use like any other view. You can ensure that it’s working by issuing a SELECT * FROM Transform query, which results in the structure shown in Figure 3. Later examples will demonstrate how to modify the transformation view dynamically when it’s invoked.
Restructuring Views with Dynamic Output Modification
|Figure 4. Dynamic Restructuring: The query does not select the Cust node, which removes them and promotes the Invoice and Address nodes.
Most XML transformation procedures today are static, so they must be modified to include or exclude a value in the output XML structure. A SQL-based transform does not need statically placed XML formatting operations because it uses a dynamic SELECT list, allowing simple and dynamic data item output inclusion or exclusion as in Figure 4, which uses the following SQL query:
SELECT EmpID, DpndID, Invoice, Address FROM Transform
Note that the query does not SELECT the Cust node, causing the Cust nodes to be removed and promoting the Invoice and Addr nodes around them, changing the already transformed structure. You can modify the dynamic SELECT list from query to query without changing the transformation logic contained in the Transformation view. This provides and incredibly powerful dynamic and reusable operation that’s both flexible and easy to use.
Changing Leg Order and Replicating Nodes
The next example (see Figure 5) starts with the full StoreView data structure from Figure 1 and transforms it in a number of ways. The StoreView view used in Figure 5 is composed of two views, CustView and EmpView. This transform operation breaks the CustView out of the StoreView by isolating each of its nodes. Having done this, it reassembles the Cust view and deliberately swaps its Addr and Invoice legs around by altering the order in which the joins are performed, which works because the structure is built from the top down, and left to right. You can see this change by comparing the result structure to the previous examples.
|Figure 5. Changing Leg Order: Using several joins, this query extracts the Cust view from the StoreView and then reassembles it, swapping the Address and Invoice legs and adding an additional Invoice node under Addr.
SELECT Cust.custid, Invoice.invid, Invoice.invcustid, Addr.addrid, Addr.addrcustid, invoice2.invid NewInv FROM Storeview Cust LEFT JOIN StoreView Addr ON Cust.custid=Addr.addrcustid LEFT JOIN StoreView Invoice ON Cust.custid=Invoice.invcustid LEFT JOIN StoreView Invoice2 ON Addr.addrcustid=Invoice2.invcustid
The replicated data caused by the combination of the EmpView and Custview data is automatically removed when the output data is converted to XML. In addition, the transformation selects an additional Invoice node and places it under the Addr node in Figure 5.
The previous restructuring examples used fragments to move data around. In contrast, this example operates at the node level. For complete control, the node level works well, but fragments are generally easier to work with, and also offer an important capability that node level does not. Often nodes in a group are already in the structure desired and the group may no longer contain the values (i.e. foreign keys) that would enable you to rejoin them. Fragments offer a solution because you can move them as a single group that remains intact as shown in the next example.
Restructuring Using Below-the-Root Linking
This next example links two multipath structure fragments by linking them below the root of the lower fragment. The example demonstrates how such linking opens up many more opportunities for combining fragments. The question is, is linking below the root valid, and if so, how is it handled semantically? It turns out that it is valid because it operates on the same principles as “linking below the root” described in the previous article in this series. The root of the lower structure still remains the hierarchical modeling link point.
|Figure 6. Restructuring using Below-the-Root Linking: Even though this query relies on the Cust root node, that node isn’t selected for output.
The SQL example below demonstrates isolating two fragments represented by the prefix “SV1” for the Emp fragment and “SV2” for the Cust fragment. The query links the fragments by matching the EaddrID and AddrID data values in the Eaddr and Addr nodes. This is allowed, and the Cust root of the lower level fragment is used as the data modeling entry point as shown (see Figure 6).
SELECT SV1.EmpID, SV1.DpndID, SV2.InvID, SV2.AddrID FROM StoreView SV1 LEFT JOIN StoreView SV2 ON SV1.EaddrID= SV2.AddrID
Interestingly, the example also shows that despite the query’s reliance on the Cust root node, that node does not need to be specified for output. This changes the question to “How is the Invoice node handled?” because Invoice is only indirectly related to the Addr node—through the common higher level node Cust node that is removed for output. The answer is that internally, the lower fragment is hierarchically combined with the upper fragment from the Eaddr node to the lower Cust root node. Then the unselected Cust and Eaddr nodes are removed and node collection takes place, connecting the Invoice and Addr nodes to the next higher level Emp node. The concept of node collection was also explained in the previous article in this series. This example demonstrates both the power of combining hierarchical operations and shows how they maintain the validity of hierarchical semantics.
The dashed arrow in Figure 6 represents the data linking between the two fragments, while the solid arrow represents the resulting structure.
Reshaping Structures Without Using Relationships
The examples you’ve seen so far have showed restructuring using data relationships to perform the required joins. However, if the data relationships are not available, there is still another way to perform the transformation—by utilizing the semantics in the data structure to move the desired node(s) to a new location on the structure. This feature allows any-to-any structure reshaping; however it’s a different operation than the restructuring operations you’ve seen, and will usually produce different results. Restructuring uses data relationships existing in the structure to introduce different semantics, while reshaping changes the hierarchical structure’s shape while retaining the same basic data semantics. Restructuring gives you the structure data semantics you want, and reshaping gives you the data structure you want while maintaining the semantics of the data.
You reshape a single structure using two or more copies of that structure, and left joining on the previous node compared to itself, repeating the operation for each node or fragment to be attached to the resulting structure. The example in Figure 7 reshapes a nonlinear multipath structure to a linear structure without using any matching relationships in the structure to be reshaped using this SQL:
|Figure 7. Reshaping Structures: Reshaping does not use data relationships and does not change the basic semantics of the data in the original structure.
SELECT SV1.EaddrID, SV2.EmpID, SV2.DpndID FROM EmpView SV1 LEFT JOIN EmpView SV2 ON SV1.EaddrID = SV2.EaddrID
In Figure 7, note that the first root node of the new reshaped structure is Eaddr, so this node gets moved to the result structure to start the process. Next, the first comparison compares this EaddrID to itself in both copies, to synchronize and position the copies. Then the second node, Emp, gets moved over from the synchronized copy using a LEFT JOIN, which produces Eaddr over Emp. This process relies heavily on linking below the root of the lower structure, which works out nicely since most of the time the node you are after is not the root.
Finally, the same processes can be repeated to bring the Dpnd node over by synchronizing on the Emp node. However, there’s a shortcut that can reduce the number of steps needed to move all the desired nodes over. Without the shortcut, building the output structure will always require a fixed number of steps—the number of nodes in the output structure minus one. Notice in Figure 7 that the Dpnd node is already positioned below the Emp node, so it can simply be moved over along with the Emp node, saving a synchronization join step. You can see this reflected in the SELECT statement in the SQL, where the lower-level SV2 prefix is used for both EmpID and DpndID to select them together.
One again, solid boxes represent selected items. The unselected dashed boxes are not moved to the output structure. You can then see how the reshaping was accomplished by looking at the selected fields from the combined structure. The LEFT JOIN operations both perform the data modeling and also serve to copy the needed data into place.
For additional information on Reshaping, see the article, “ANSI SQL Semantically Controlled Any-to-Any Data Structure Reshaping.”