SQL Hierarchical Parallel Processing—A Good Target
Because SQL can automatically process hierarchical structures, it can naturally support a high level of parallel processing, controlled by a parallel processor controller that automatically assigns tasks to multiple cores. Is it worth the effort to build such a controller for SQL hierarchical processing?
SQL has a huge audience. It is by far the most popular database query language. So if SQL can be made to exploit multicore parallelism automatically, by letting developers use a familiar language, then such an application should be considered further.
Hierarchical structures are useful because they organize data in a way that increases in value geometrically as additional nodes are added. Most applications can be written to use hierarchical structures. In addition, the natural creation of hierarchical structures automatically acts as the parallel processing design phase for multicore applications.
Hierarchical XML integration with SQL hierarchical processing is transparent and allows nontechnical users to build powerful queries. Hierarchical optimization enables the use of global views so users don't need to be concerned with the structure of the data being processed, and also adds significant efficiency and reusability—which should also attract more users.
The combination of hierarchical and parallel processing can also support full multipath hierarchical processing. To obtain even greater benefits, previous SQL linear limited operations such as aggregation, ORDER BY and GROUP BY can be extended to take advantage by operating hierarchically utilizing the multiple pathways: Each path can be carrying out its own different aggregation, ordering or grouping operation in the multipath hierarchical structure, operating independently on their own pathway.
Parallel Processing Review
Current high-level programs capable of automatic parallel processing can only identify a well-known group of parallelizable processes involving sets of data. The remaining program logic is difficult to identify automatically, so it requires a parallel programming and design stage or it is left unparallelized.
There are three basic methods to parallelize a program and make it perform faster: identifying independent tasks, partitioning data, and pipelining. If your application involves large sequential tasks, they can be broken up into smaller processes tasks that can be run in parallel. If your program is data oriented, the data can be organized into separate horizontal or vertical data chunks so the different data ranges can be processed in parallel. These approaches speed up the execution of your application. Pipelining overlaps multiple executions of the entire application—not to achieve faster execution—but to greatly increase the throughput. But, applications that loop back on themselves can use this same technique of pipelining (overlapping), gaining throughput because the execution time is reduced. All three methods can be used on the hierarchical processing application described in this paper.
The more independent tasks that can be run in parallel, the greater is the benefit. However, simply assigning more cores to a given task results in less and less return for each additional core (Amdahl's law). For example, suppose you have two different sort processes that can be run at the same time. Both have an equivalent amount of data to sort, so the two sorts are well balanced. You could perform the sort tasks in parallel by partitioning the data. Assume that you have four available cores. The application of Amdahl's law means that allocating three cores to the first sort and only one to the second sort is not as efficient as assigning two cores to each sort.
Spreading cores over independent tasks fits extremely well into hierarchical structure processing. The required tasks can be determined automatically based on the user's query, and assigned to many processors, each working independently because of the independent pathways.
SQL Hierarchical Engine for Increased Parallel Processing
SQL databases can automatically execute multiple instances of the query processor in parallel in large servers. For personal computers SQL processors can automatically execute certain set operations automatically in parallel as already described. To achieve full and complete parallel processing, another improvement is necessary. When SQL is performing hierarchically, the relational engine can be replaced transparently with a true hierarchical engine. The replacement is transparent because developers still use ANSI SQL to write queries, and the results are the same. However, hierarchical engines can be faster because they eliminate data duplication overhead and most importantly they can perform the different hierarchical pathways using parallel processing much more efficiently and thoroughly.
Figure 2 demonstrates how the structure being processed was retrieved and modeled in the Data Storage Access step, and stored in memory in the Data Memory step that models the data structure.
The Parallel Processing step then processes the data in memory, retrieving and storing the data as needed. Maximum parallel processing can be achieved because the pathways of the modeled structure are independent and can be processed concurrently. The data can even be retrieved from storage in parallel.
SQL Query Input Preprocessing
This section will cover the preprocessor portion of the SQL Driven Hierarchical Engine and the following section will cover the query processing of the SQL Driven Hierarchical Engine. Basically, the preprocessor creates separate processing tasks that can be performed in parallel, while the query processing in the next section dynamically determines the best way to perform the tasks in parallel based on the currently available resources.
|Figure 3. SQL Query Preprocessing: In this figure, the dashed boxes represent operational data, the solid boxes represent software processes and the arrows represent data movement.|
shows the SQL Query preprocessor. The dashed boxes represent operational data, the solid boxes represent software processes and the arrows represent data movement. The software processes take in the operational data and produce new operational data. The process involves generating parallel processing tasks and the runtime control blocks.
Walking through the SQL Query Preprocessing in Figure 3, the SQL Query statement is parsed. This not only produces the required processing in internal form for easy access, but also externalizes the structure of the hierarchical data structure being processed. These are both used together to build tasks that can be executed in parallel to perform the query. The information necessary to know how these tasks can be run together to process the query and how to fully utilize their parallel processing capability is stored in the generated Control Blocks produced by the Control Block Builders. These consist of the data Input, data Memory, and Processing Control Blocks Builders and their produced Control Blocks. How these dynamically produced elements are brought together and controlled is covered in the following Query Driver section.
The SQL Query Preprocessing performed in Figure 3 above can be easily performed using parallel processing. Internalizing the SQL request and determining the data structure being processed can be performed concurrently. The determination of the runtime data structure also naturally performs access optimization by not recording the unnecessary pathways in its control block as if they never existed. Going down further on the parallel paths, the separate Control Block Builders can be performed concurrently. And on the other parallel side, the parallel tasks builder is reentrant and can be invoked in parallel to build multiple tasks concurrently. These tasks consist of pseudo code that allows their operation to be adapted to the current processing environment. The automatic mechanism for dynamically controlling the concurrency level of the parallel processing described in the next section may possibly also be used for the preprocessor's concurrency processing.
There are three levels of concurrency for hierarchical processing. The first level, used only for very large queries is to distribute pieces of the query to other processors or servers. The hierarchical processing makes this easier to perform because breaking the query that is operating on a hierarchically data modeled structure apart and later reassembling it is easy because it is hierarchically structured. The SQL query statement pieces that are distributed will themselves automatically operate hierarchically at their new location. The second level is the task parallelization along the different pathways and data partitioning. The third level is the parallelization along a pathway to handle multiple data occurrences concurrently. The first and third levels are optional depending on the query and multicore resources.
The previous three levels of concurrency processing all use their own core processor(s), but another technique that will be utilized is overlapping I/O and read ahead that will be used that have a tremendous payback without using an additional dedicated core processor. This technique is used to reduce very expensive wait states on a core processors by having them overlap I/O wait states by starting up an I/O call and then perform processing while the much slower I/O call is being processed. This can actually be considered a concurrent process, because the I/O call is actually performing peripheral device processing on the I/O device.
In memory, the hierarchical data storage has many parallel processing advantages. First, no duplicate data usage is required as is the case with relational processing engines. Figure 1 also shows how the structured data memory is hierarchically laid out. It demonstrates how hierarchical data memory naturally supports the independent pathways used by their specific tasks without conflicting with other tasks shown in Figure 2. In this way, different concurrent tasks can be updating the data memory at the same time. The complexity of the hierarchical structure should also help limit caching conflicts between different cores by keeping the different concurrent tasks area of memory at enough distance from each other.