ith the processing power of single-core chips running out of steam, hardware manufacturers have turned to multicore processors to make up for the lack of processing power increases in single-core chips. Unfortunately, developing multicore-capable applications is not a transparent or plug-and-play solution. Such applications require developers to use parallel programming techniques—which are not simple and require program design changes. On the plus side, increased funding for parallel processing research may lead to improved parallel processing design, programming, and testing techniques. This article explores an alternative approach that can automatically perform efficient and effective parallel processing of hierarchical data structures.
Before the advent of relational technology, hierarchical structures and their processing were used heavily in applications and databases. However, at that time, hierarchical structures were rigid, so they lost ground to the more flexible relational technology. But because of XML's ubiquity and flexibility, hierarchical structures are making a comeback. Hierarchical structures are both well-behaved and offer advanced processing capabilities. They also offer one unexplored capability examined here: Hierarchical structures and processing are fully and naturally parallelizable.
Because they're fully and naturally parallelizable, hierarchical structures can be processed in parallel automatically, which in turn enables automatic, efficient, and optimized use of multicore processors at a level not achieved before. This article shows how to implement a complete and automatic parallel processing solution for hierarchical query processing. Making the process automatic is critical, because hardware manufacturers aren't likely to stop at two, four, or eight cores in chips; future chips will have 64 or even 128 cores. The only way to use all these cores effectively is to make the process automatic.
Automatically Parallelizable Hierarchical Query Processing
Notice the naturally parallelizable nature of hierarchical data structures in Figure 1
. In this example, the nodes are named A through G, and their data occurrences are numbered. The nodes going down represent pathways. Also note that the pathways naturally spread out, creating more and more pathways—and that those pathways are independent. In other words, each pathway can be processed independently and at its own speed. In this example, each node in every path has two separate data occurrences, creating multiple data occurrences of each pathway.
|Figure 2. Parallel Processing From Start to Finish: The dashed lines in the figure show how the pathways through a hierarchical structure are naturally parellizable from data access through all processing stages.|
|Figure 1. Hierarchical Data Structure: The figure shows that hierarchical pathways are independent, and thus parallelizable.|
These same hierarchical structure pathways are used from data input, through processing, through data memory, to output, as shown in Figure 2
Each pathway can contain multiple data occurrences as shown—which are also naturally parallelizable giving an extra boost to parallel processing's advantages.
With the hierarchical data structure known at processing time, the parallel processing strategy can be determined automatically with no user intervention. In addition, this parallel processing strategy can be adjusted dynamically during execution as resources, inefficiencies, and better opportunities are determined. The automatic operation knows the exact parallel processing status and the status of the multiple cores at all times.
Because of the many separate hierarchical pathways, each with a possibility of a significant number of data occurrences, a large number of cores can be used independently and simultaneously, allowing maximum multicore utilization and efficiency. In addition, such automatic parallel processing is both provable and accurate because of hierarchical structures well known behavior and characteristics.