Hierarchical Query Driver and Parallel Processing Controller
The input query is preprocessed by generating the query tasks and run time control blocks necessary to perform the query described in the previous section. The preprocessed query can be saved and executed later using the SQL SELECT
list to control the operation. This section covers the software design needed to execute the preprocessed query in parallel.
|Figure 4. Query Parallel Processor: At the heart of the processor, the Parallel Processor Controller controls dynamic parallelization.|
depicts the design of the Query Parallel Processor. It controls the input data access, processing tasks, memory use, and query result output—all of which can be run in parallel and controlled dynamically using a Parallel Processor Controller (PP CNTL in Figure 4
The Query Driver coordinates and drives the entire query processing in Figure 4. It knows what must be done from the information supplied in the Query Processing Control Block. It controls the Input Controller, Output Processor, Memory Manager, and the Task Driver. The Input Controller controls the data access routines. The Output Processor outputs the query result data in structured XML. The Memory Manager manages the hierarchical memory linkages and optimizes memory usage. The Task Driver controls the Query Tasks and their concurrent execution by the Task Interpreter, which processes the pseudo code. The Parallel Process Controller controls dynamic parallelization.
The Input Controller and the XML and relational data access routines in Figure 4 access the required data efficiently when needed, retrieving only data along the pathways actually being used (as discussed earlier). The XML access routine avoids the standard DOM and SAX parsers and uses its own access routine, optimized especially for database hierarchical access used by SQL. Multipath data access can be performed in parallel. Other possible data sources can be flat files hierarchically modeled and other physical hierarchical sources such as structured VSAM and IBM's still popular IMS.
The Memory Manager also manages the hierarchical data linkages as shown in Figure 1, and optimizes memory use by reusing memory as soon as possible. The hierarchical data structure consists of multiple data occurrences at each node. Sometimes all of a node's data occurrences' need to be all in memory; at other times only the current node occurrence needs to be in memory and then its memory space can be reused to hold the next data occurrence. If data will be reused multiple times, it is kept until no longer needed, which can help with storage efficiently for large complex queries. The basic idea behind this operation is to be memory efficient, but to never have to retrieve the same data more than once. Proper data management can save a great deal of memory.
The Output Processor formats and outputs the result of the query result in structured XML. This is possible because the processor is aware of the structure being processed, so it knows the output structure as well. Unlike the other tasks shown in Figure 4, normally Output Processing must wait until the query completes. However, Output Processing can be overlapped in a pipeline fashion controlled at each root node data occurrence. This achieves concurrent parallel processing. Basically, the Output Processor can output XML for each data occurrence of the root node overlapping query processing with the output procedure. This works except when a specific ordering at the root level is required, and the data is not already ordered or indexed in order. Overlapping the output is particularly useful when the output is being displayed interactively by reducing the display time lag. This also allows the first screen of data to be presented quickly, without having to wait for the complete query to complete.
The Query Driver in Figure 4 controls the execution order of the query tasks to ensure a correct result. It also controls the level of parallelization by running some tasks concurrently as determined by the Parallel Processor (PP) Controller. The Query Driver has a thread assigned for system tasks that it may loan out for small non-system tasks.
The Parallel Processor Controller in Figure 4 does more than determine which tasks can be run concurrently. It also performs all the required concurrency operations. Other Parallel Processing considerations not already covered consist of:
- Keeping track of available resources and deciding where to allocate scarce resources
- Allocating and assigning threads in pools to avoid the overhead of closing and launching threads
- Dynamic load balancing performed by thread stealing, which also avoids the overhead of closing and launching threads
- Longer paths are given priority, so the processor can perform more work for same amount of overhead
- Assigns threads first to tasks that do not already have a thread (Amdahl's Law of diminishing returns)
- Limits threads to a given task to three (also because of Amdahl's Law)
- Waits on events, not by time period, which results in more efficient use of resourses
- Resource synchronization at active node fork points, which are hierarchical processing coordination points
- Considering aggregation operations for partitioning (multiple aggregations are already parallelizable when on different paths, and each can still be partitioned for parallel processing if worthwhile)
- Still operates efficiently on single core systems by taking advantage of read ahead and I/O overlapping
Many applications can be written easily as hierarchical applications to take advantages of their extra capabilities and processing advantages. Parallelizing applications is both difficult and requires extra design and testing. Automatic generation of parallel processing applications relieves programmers of this difficult and error-prone task. Developing an advanced and dynamic parallel processing controller is not justifiable for a single parallel processing application, but is worth the effort when amortized across many different applications or unlimited new queries. Such a processor is generic, allows experimental tuning, and holds out the possibility of future technology improvements as they occur.
Hierarchical multi-core query processing opens the door to extremely powerful and internally complex queries that can be specified without prior knowledge of the queried structure. With the ability to easily specify more complex queries, and utilize more cores, even more powerful parallel processor controllers will arise that can fully and efficiently use the ever-increasing available power of multiple cores.