Browse DevX
Sign up for e-mail newsletters from DevX


Run Oracle Stored Procedures in Parallel—Inside the Database-2 : Page 2

If you need a solution for running Oracle stored procedures in parallel, don't reinvent the wheel with solutions that run outside the database. Execute stored procedures in parallel inside the database using Oracle's native APIs and PL/SQL.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Using Oracle Native APIs: DBMS_JOB Package
Oracle databases provide a rich set of prepackaged APIs in the form of Supplied PL/SQL Packages. I browsed the Oracle 9.2 documentation and found references to 101 packages, although issuing the following query returns 320:

select count(*)
  from dba_objects
 where object_type = 'PACKAGE BODY'
   and owner = 'SYS'

See the complete list at Oracle9i Supplied PL/SQL Packages and Types Reference, Release 2 (9.2). Access requires an Oracle.com login.

Before you start programming your own code, always check Supplied PL/SQL Packages. Most of the time, you'll find a useful API for your task, which the Oracle development team has already developed (and tested). The DBMS_JOBpackage is one of those APIs. It's Oracle's native scheduler, which provides a set of APIs for submitting and managing jobs in a queue.

How does it help with parallel execution? Well, if you submit multiple jobs to a job queue and schedule them all to start immediately, then they will start and run in parallel. The useful API in this case is DBMS_JOB.SUBMIT. It submits jobs to a job queue for execution. The next_dateparameter of DBMS_JOB.SUBMIT determines the next date when the job will be run. It has a default value of SYSDATE, so if you don't specify a date, the job will start immediately. However, the job will be submitted to the job queue only after commit, which makes this process transaction-based.

You won't find this functionality in the UNIX shell scripts. Scripts run with no transaction knowledge. As a matter of fact, these scripts will start (and subsequently fail) even if the database is down. DBMS_JOB-based code has no such problems because it lives inside the database.

Another problem with the UNIX-based scripts is security. Because they reside outside of the database, they have to contain a password to connect to the database. Hard-coding passwords in scripts is not a secure solution. The same problems apply to Java-based code, since it also lives outside the database.

Another big advantage of the DBMS_JOB-based solution is load balancing. A system-modifiable database parameter, job_queue_processes, determines the maximum number of concurrent jobs the solution can execute. In another words, it controls how many jobs can be run simultaneously. Consider the following scenario:

  1. You develop code that detects out-of-balanced indexes, which can happen as a result of massive delete operations.
  2. You want to rebuild them, and to save time you decide to do it in parallel.

Now, how many indexes do you rebuild in parallel at the same time, knowing the number of out-of-balance indexes will vary? For example, if you have three "bad" indexes, you'd rebuild them all at the same time. However, if you have 30 indexes to rebuild and only a four-CPU server, you'd want to build only some (maybe eight) indexes simultaneously and process more after the previous ones get rebuilt. That's exactly what the job_queue_processesparameter enables you to do.

The DBA sets this parameter based on the database server resources and workload. For example, a DBA would set a four-CPU machine to 8:

alter system set job_queue_processes = 8;

As a result, when you rebuild 30 indexes, all 30 jobs will be submitted to the queue but only eight will start. As soon as a job finishes, the next job from the queue starts. Only eight indexes are rebuilt concurrently to conserve database server resources.

Control Concurrency
DBMS_JOB is a nice way to execute jobs in parallel because you control the degree of concurrency. You could set up to 36 job queue processes (in 8i) or 1,000 (in 9i) and that would be the degree of concurrency.

The performance benefits could be substantial. For example, I did a performance test on two 1-million-row tables with four indexes to rebuild. The following snippet is the first block of code I used, which rebuilds indexes with NOLOGGING PARALLEL options serially using the RebuildIndex procedure of my Maintenance package:

   vStart number;
   vStart := dbms_utility.get_time;
   Maintenance.RebuildIndex('ORDERS_PK', 'NOLOGGING PARALLEL');
   Maintenance.RebuildIndex('SALES_PK',  'NOLOGGING PARALLEL');
   Maintenance.RebuildIndex('ORDERS_AK1','NOLOGGING PARALLEL');
   Maintenance.RebuildIndex('SALES_AK1', 'NOLOGGING PARALLEL');
   dbms_output.put_line('Elapsed time: ' ||
      round((dbms_utility.get_time - vStart)/100, 2) || ' hsecs' );

The second block of code rebuilds indexes in parallel using DBMS_JOB:


   vJob number;
   dbms_job.submit(job  => vJob,
                   what => 'Maintenance.RebuildIndex(''ORDERS_PK'',
                    ''NOLOGGING PARALLEL'');');
   dbms_output.put_line('Index ORDERS_PK: Job ' || to_char(vJob));

   dbms_job.submit(job  => vJob,
                   what => 'Maintenance.RebuildIndex(''SALES_PK'',
                    ''NOLOGGING PARALLEL'');');
   dbms_output.put_line('Index SALES_PK: Job ' || to_char(vJob));

      dbms_job.submit(job  => vJob,
                   what => 'Maintenance.RebuildIndex(''ORDERS_AK1'',
                    ''NOLOGGING PARALLEL'');');
   dbms_output.put_line('Index ORDERS_AK1: Job ' || to_char(vJob));

   dbms_job.submit(job  => vJob,
                   what => 'Maintenance.RebuildIndex(''SALES_AK1'',
                    ''NOLOGGING PARALLEL'');');
   dbms_output.put_line('Index SALES_AK1: Job ' || to_char(vJob));

The above block executes almost immediately and produces the following output:

Index ORDERS_PK:  Job 10
Index SALES_PK:   Job 11
Index ORDERS_AK1: Job 12
Index SALES_AK1:  Job 13

However, these results don't mean the indexes got rebuilt that fast. Indexes get built in the background. To view the progress, you can query the user_jobsview:

select job, total_time
  from user_jobs
 where job in (10, 11, 12, 13);

The output of this query shows that jobs 10–13 have been running for two seconds:

10      2
11      2
12      2
13      2

If you keep querying for a while, some jobs will finish and disappear from the query output:

12    127
13    127

Jobs 10 and 11 are done. Indexes ORDERS_PK and SALES_PKare rebuilt, and it took somewhere between 117 and 127 seconds (since you keep querying every 10 seconds). When the above query returns no rows, the index rebuilding jobs are complete. The second block of code, which rebuilds indexes in parallel, executed 30–35 percent faster than the first block. That's a significant performance gain.

Now that the execution has completed, you can tell one piece of code is definitely missing from the parallel execution block listed previously. It needs a procedure that automates the process for determining execution completion. The way it currently works is not elegant at all: opening another SQL*Plus session and issuing the same query every 10 seconds from that session. (See the WaitForCompletion procedure in JobsPkg in Listings 1 and 2.)

A possible solution would be a procedure that polls the user_jobsview in a loop until it returns no rows. But again, as I said at the beginning, there's no need to reinvent the wheel. Oracle provides two packages to communicate between different database sessions: DBMS_ALERT and DBMS_PIPE. I've chosen DBMS_ALERT, because when a job completes, it sends an "I'm done" signal to the main block of code that initially submitted that job to a job queue.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date