RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Navigationless Database XML: Hierarchical Data Processing  : Page 4

Although current databases limit XML processing to linear XPath or XQuery queries, existing hierarchical database techniques enable far more complex queries using standard SQL.

Navigationless Hierarchical Processing
The hierarchical database's unambiguous structure enables full nonprocedural navigationless processing, using natural hierarchical processing rules and principles. These hierarchical processing rules and principles can be applied automatically to the hierarchical structure being processed to totally eliminate hierarchical structure manual navigation. You've already seen this in action in the query examples in this article (see the SQL queries in Figure 4, Figure 5, and Figure 6). Navigationless processing enables many XML capabilities not available today that can take XML database processing into the next generation.

Transparent Multi-Path Query Processing
The complex LCA coordination processing required for multi-path processing is too complicated and error prone to be performed by procedural processing and manual navigational specification. Utilizing nonprocedural processing and navigationless access, transparent multi-path hierarchical query processing has the ability to automatically process the entire multiple pathway hierarchical structure (or whatever portion of it is required for nonprocedural navigationless queries). This automatic hierarchical navigationless processing can handle both single and multi-path queries. Users do not need to be concerned with whether query processing requires single or multiple pathways. For example, the SQL query that results in Figure 6 has four different paths: A/B/C, A/B/D, A/E/F, and A/E/G.

Enables XML Query by Non-Technical Users
Currently, only trained XML knowledgeable users can specify queries that access native XML. This is because they require manual navigation and are limited to single path linear pathways. Nonprocedural navigationless operation means that the query user does not require XML training or knowledge of the hierarchical structures to specify the query. Users are not limited to specifying data from only a single path of the structure. Query access and processing limitations disappear.

Supports Unlimited Internal Complexity
The more paths of the hierarchical structure that a query accesses, the more complex the processing logic will be. For current manual navigation and processing of multiple paths, the required processing logic is the responsibility of the query user. Correct hierarchical logic for handling multiple paths must be specified correctly by the user. This complexity typically limits manual navigation of queries to single paths of the structure. In contrast, nonprocedural navigationless queries automatically and transparently perform the navigating and hierarchical processing—regardless of the required complexity.

Correct Hierarchical Results
With manual navigation, query correctness also relies on the user. With nonprocedural navigationless processing, the results of the hierarchical query remain hierarchically correct regardless of the number of different query pathways referenced or the internal processing complexity required. Natural hierarchical rules and principles of hierarchical processing are applied internally, consistently, and automatically. This capability is missing in XML business processing, which requires precise results. It's worth noting that queries such as the one in Figure 6 also increase accuracy, because they are processed automatically.

Dynamically Increases Database's Data Value
The capabilities described in this article mentioned that multi-path queries automatically use the additional semantics that exist between the different legs of the structure being accessed. This dynamically increases the value of the data being processed. For example, any query that accesses data in one path of the structure based on data in another path of the structure takes advantage of the additional semantics that exist between the two pathways to resolve the query. In Figure 2, a condition on node data occurrence E2 qualifies a path on node data occurrence B1. That dynamically establishes a more complex meaning to the result that would not be possible without multi-path processing. You can see the data value increase in Figure 6, which involves very complex multi-path LCA processing.

Increases Number of Different Queries Possible
The same multi-path processing capabilities that increase data value as described above also means that the number of new meaningful queries possible for a given hierarchical structure becomes practically unlimited, because there are so many different combinations of paths possible. Each of these different multi-path queries dynamically increases the value of the data used in the query. As an example of this, the wildly varying SQL query examples in Figure 4, Figure 5, and Figure 6 demonstrate the unlimited number of queries possible for the same structure or view.

Powerful Hierarchical Optimization
The next three navigationless hierarchical processing capabilities require powerful optimization techniques—which itself requires the flexibility of navigationless processing.

A nonprocedural navigationless interface can easily analyze an entire query for optimization. For example, queries requiring localized query access to different portions of the full structure can use a SAX interface instead of a DOM interface. (A DOM interface usually reads the entire structure into memory before processing while a SAX interface is instructed to access only certain node types.) Additionally, optimization can use the knowledge of the entire structure and a knowledge of exactly what is required to create an intelligent access strategy that automatically determines when to save input data for possible reuse and when to reuse data space.

Global Hierarchical Views and Reuse
Global views that describe entire structures have previously been discouraged because of the overhead involved. But query flexibility combined with hierarchical optimization and navigationless processing means that global views can have optimum reuse. That allows users to define an unlimited number of possible queries with no overhead. These users do not have to be aware of the structure in the view or concerned with keeping track of specific smaller views; they need only specify the data they want returned, which can reside anywhere in the global structure view. The hierarchical optimization discussed earlier in this article eliminates the need to access unneeded pathways, which also eliminates any global view overhead. As an example of the multitude of different queries possible, the SQL queries in Figure 4, Figure 5, and Figure 6 differ in data accessed, filtering applied, internal processing, and the resulting output structure—all accomplished using the same global view, with no overhead.

Global Hierarchical Queries
Today, programmers rarely attempt to apply query filtering to an entire XML document except in critical applications. But with nonprocedural navigational processing, filtering does not present a problem. The SQL example accompanying Figure 4 performs a complex filtering on the entire hierarchical structure and then outputs the entire structure. The global optimization possible which was described previously can optimize the query for advanced dynamic memory management which is very important to the efficient processing of this global query. This global query is performed easily in SQL using the SELECT ALL operation as shown in the SQL examples in Figure 4 and Figure 6.

Focused Retrieval with Result Aggregation
Information Retrieval (IR) operational needs are currently missing the capability to correctly identify meaningful XML documents and then to return only the desired data in a meaningful way. The hierarchical data filtering capability enables the location of only meaningful documents, and selects only the desired aggregated results. And finally, the global view, with its powerful hierarchical processing, allows any query to be specified. These capabilities are demonstrated in Figure 5, which locates documents qualified by the filtering condition WHERE F.f ='F2', and then aggregates and structures only the specified data output, based on the semantics of the hierarchical structure processed.

Going Forward
This article argued that structured XML processing in databases today is lacking, because it requires different processing than the default markup processing currently used. Such capabilities already exist, and were thoroughly vetted over three decades ago in hierarchical databases. Using a combination of nonprocedural navigationless processing, dynamic data modeling, dynamic XML-formatted output, and unlimited query capability, the door is open for unlimited new hierarchical processing capabilities that take full advantage of the capabilities inherent in hierarchical structures.

Michael M. David is the founder of Advanced Data Access Technologies, Inc. Previously, he was a staff scientist and the lead XML architect for NCR/Teradata, and served as their representative to the ANSI SQLX Group. He has more than 25 years of experience researching and designing commercial nonprocedural heterogeneous database hierarchical query processing products using flat, relational, and hierarchical data. He authored the book Advanced ANSI SQL Data Modeling and Structure Processing, as well as numerous papers and articles on this subject. You can find additional information on hierarchical data structures, principles, navigationless processing, and automatic processing as well as a demo.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date