devxlogo

Optimize the Massive DELETE Operation in Oracle, Part 2

Optimize the Massive DELETE Operation in Oracle, Part 2

y previous 10-Minute Solution “Optimize the Massive DELETE Operation in Oracle, Part 1” described the technique for performing a high-performance massive DELETE operation as a direct-path INSERT. At the end of the article, I outlined six steps for implementing this technique:

  • Step 1
    Create a table (TestData_X) with identical structure to an existing table (TestData) but without primary key, alternate key(s), and foreign key(s) constraints. Create it with the NOLOGGING option.
  • Step 2
    Insert the data you need to keep into the newly created table using a direct-path INSERT.
  • Step 3
    Create indexes on the new table with NOLOGGING PARALLEL options.
  • Step 4
    Create constraints on the new table. Use the ENABLE NOVALIDATE option for referential integrity and check constraints.
  • Step 5
    Swap the TestData and TestData_X tables: rename TestData to TestData_T, and rename TestData_X to TestData.
  • Step 6
    Restore the original logging settings. Verify the results and drop the TestData_T table.

As you can see, the steps are rather general. You could automate them by creating a set of APIs that accepts the table name as a parameter. I developed a SwapOps package that delivers the functionality of the above six steps.

How do I complete a massive DELETE operation in my Oracle database without having to pay the heavy performance overhead?

Perform the massive DELETE operation as a swap delete with parallel execution. This technique can offer significant performance gains over a straight or in-place delete operation.

The SwapOps Package
The SwapOps package code extracts metadata from database objects, such as tables, indexes, constraints, etc. A previous 10-Minute Solution, “Use an Oracle API to Extract and Store Database Objects’ DDL“, described a MetaDataPkg package that allowed you to automate the process of extracting objects’ definitions and storing them in memory structures (associative arrays). Since then, I made a small modification to that package by adding one more attribute (aSwapName) to type tMetaObject:

type tMetaObject is record(   aName         tString,   aType         tString,   aLogging      tString,   aParallel     tString,   aStatus       tString,   aValidated    tString,   aRely         tString,   aSwapName     tString,   aDDLString    tLongString);

The aSwapName attribute holds the name for a swap object (e.g., for the TestData table, the aSwapName value will be TestData_T, and for the TestData primary key TestData_PK, the value will be TestData_PK_T, etc.).

The accompanying source codecontains the MetaDataPkg package’s specification and body.

Now, let’s look at the SwapOps package’s specification:

create or replace package SwapOps as  /*   * Constants   */   cSwapSuffix constant char(2)  := '_X';   cTempSuffix constant char(2)  := '_T';  /*   * GetMeta   */   function GetMeta   (      pName  in MetaDataPkg.tString,      pForce in boolean := false   )   return MetaDataPkg.tFullMetaObject;  /*   * CreateObjects   */   procedure CreateObjects   (      pName    in MetaDataPkg.tString,      pType    in MetaDataPkg.tString,      pLogging in MetaDataPkg.tString := MetaDataPkg.cNo   );  /*   * RenameObjects   */   procedure RenameObjects   (      pName       in MetaDataPkg.tString,      pType       in MetaDataPkg.tString,      pFromSuffix in MetaDataPkg.tString,      pToSuffix   in MetaDataPkg.tString   );  /*   * ExecuteInsert   */   procedure ExecuteInsert   (      pName  in MetaDataPkg.tString,      pWhere in MetaDataPkg.tString := null   );  /*   * ExecuteSQL   */   procedure ExecuteSQL   (      pSQLString in MetaDataPkg.tLongString   );  /*   * SwapObjects   */   procedure SwapObjects   (      pName in MetaDataPkg.tString,      pDrop in MetaDataPkg.tString := MetaDataPkg.cNo   );  /*   * RestoreSettings   */   procedure RestoreSettings   (      pName         in MetaDataPkg.tString,      pLogging      in MetaDataPkg.tString := MetaDataPkg.cNo,      pIndexLogging in MetaDataPkg.tString := MetaDataPkg.cNo   );  /*   * FullInsert   */   procedure FullInsert   (      pName         in MetaDataPkg.tString,      pWhere        in MetaDataPkg.tString := null,      pLogging      in MetaDataPkg.tString := MetaDataPkg.cNo,      pIndexLogging in MetaDataPkg.tString := MetaDataPkg.cNo,      pDrop         in MetaDataPkg.tString := MetaDataPkg.cNo   );  /*   * FullSQL   */   procedure FullSQL   (      pName         in MetaDataPkg.tString,      pSQLString    in MetaDataPkg.tLongString,      pLogging      in MetaDataPkg.tString := MetaDataPkg.cNo,      pIndexLogging in MetaDataPkg.tString := MetaDataPkg.cNo,      pDrop         in MetaDataPkg.tString := MetaDataPkg.cNo   );  /*   * DropObject   */   procedure DropObject   (     pObjectName  in MetaDataPkg.tString,     pObjectType  in MetaDataPkg.tString,     pTableClause in MetaDataPkg.tString := null   );  /*   * LockTable   */   procedure LockTable   (      pTableName in MetaDataPkg.tString,      pMode      in MetaDataPkg.tString := 'EXCLUSIVE'   );   function GetSwapName   (      pName    in MetaDataPkg.tString,      pSuffix  in MetaDataPkg.tString := cSwapSuffix   )   return MetaDataPkg.tString;   function GetRenameString   (      pFromName in MetaDataPkg.tString,      pToName   in MetaDataPkg.tString,      pType     in MetaDataPkg.tString,      pTable    in MetaDataPkg.tString := null   )   return MetaDataPkg.tDBString;end SwapOps;/

At the top, I defined two constants: cSwapSuffix (‘_T’) and cTempSuffix (‘_X’). That’s where you could modify the package if you wanted to use you own scheme for naming swap objects and temp objects.

I used the following procedures to automate the six steps I outlined at the beginning. Their names are self-explanatory:

  • The CreateObjects() procedure for Steps 1, 3, and 4
  • The ExecuteInsert() or ExecuteSQL() procedures for Step 2
  • The SwapObjects() procedure for Step 5
  • The RestoreSettings() procedure for Step 6

The following is the difference between the ExecuteInsert() and ExecuteSQL()procedures:

  • ExecuteInsert() accepts a string for the WHERE clause of an INSERT statement, constructs that statement, and then executes it.
  • ExecuteSQL() is more general. It accepts any full SQL string, and you can use it to execute complex INSERT statements or other non-INSERT SQL statements such as UPDATE.

The SwapOps package also contains two wrapper procedures: FullInsert() and FullSQL(). These procedures call the above procedures?CreateObjects(), ExecuteInsert() or ExecuteSQL(), SwapObjects(), RestoreSettings()?in the right order.

A SwapOps Demonstration
This section shows an example of how to use the SwapOps package. In “Optimize the Massive DELETE Operation in Oracle, Part 1” I created a table (TestData) with 500K+ rows that was logically partitioned by values in a runId column. If you run the following query:

select runId, count(*)  from TestData group by runId;

You’ll get the following output:

     RUNID   COUNT(*)---------- ----------         1      20496         2      20496         3      20496         4      20496         5      20496         6      20496         7      20496         8      20496         9      20496        10      20496        11      20496        12      20496        13      20496        14      20496        15      20496        16      20496        17      20496        18      20496        19      20496        20      20496        21      20496        22      20496        23      20496        24      20496        25      20496

Delete the data produced by the first 20 runs?rows with runId values between 1 and 20. The code for this is very simple: just one call of the FullInsert()procedure with two parameters, tableName (‘TestData’), and a WHERE clause string for rows you want to keep (‘runId > 20’):

begin   SwapOps.FullInsert('TestData', 'runId > 20');end;/

Now, run the first query again:

select runId, count(*)  from TestData group by runId;

You’ll get this output:

     RUNID   COUNT(*)---------- ----------        21      20496        22      20496        23      20496        24      20496        25      20496

As you can see, the data from the first 20 runs is gone. The default value for the pDrop parameter FullInsert()procedure is ‘no’. That means the original table you renamed TestData_T was not dropped. For safety reasons, you may want to verify the results first and then drop it.

Now you can compare the performance of a straight or in-place delete operation to that of a swap delete.

In-place Delete vs. Swap Delete Performance
The first test deletes the results from runs with even runIDs (2, 4, 6, …, 24), which will be roughly 50 percent of the data in the TestData table:

select sum(case when mod(runId, 2) = 0 then 1 else 0 end) as To_Delete,        sum(case when mod(runId, 2) > 0 then 1 else 0 end) as To_Keep   from TestData; TO_DELETE    TO_KEEP---------- ----------    245952     266448

Run an in-place delete first:

delete   from TestData  where mod(runId, 2) = 0;245952 rows deleted.Elapsed: 00:00:39.87

Next, run a swap delete:

begin   SwapOps.FullInsert('TestData', 'mod(runId, 2) > 0');end;/Elapsed: 00:00:14.28

The in-place delete took roughly 41 seconds, while the swap delete took slightly more than 14 seconds!

The TestData table has one primary key index and one non-unique index. I added one more non-unique index, then another non-unique index to test the index factor. The numbers in Table 1 show that the swap delete kept outperforming the in-place delete.

In-place Delete (sec.)Swap Delete (sec.)
Table with one primary key index and one non-unique index39.8714.28
Table with one primary key index and two non-unique indexes52.3924.12
Table with one primary key index and three non-unique indexes64.1834.90
Table 1. Results of Swap Delete vs. In-place Delete Performance Test

Is In-place Delete Better Than Swap Delete?
Deleting almost half of the data (48 percent) obviously is much faster via a swap delete, but what about when the percentage of data you want to delete goes down (40, 25, 10, etc.)? At which point does an in-place delete become more attractive than a swap delete? Well, the less data you need to delete and the more data you need to insert, the more work you’ll have to do to rebuild indexes. For example, deleting 10 percent of the data via a swap delete means that 90 percent will be inserted in a new table and indexes will be rebuilt on 90 percent of the original data. Rebuilding indexes is a time-consuming operation. Believe or not, it’s the biggest time component of the swap delete process: during its 15-second total in Table 1, 5 seconds went to the actual insert and 9 seconds were spent on index creation.

The current code in the SwapOps package rebuilds indexes serially, one after another. One way to speed up that step is to rebuild the indexes in parallel using Oracle’s DBMS_JOB package, which I discuss it later in the article. (See Oracle9i Supplied PL/SQL Packages and Types Reference Release 2(9.2) for detailed information on DBMS_JOB package.)

In the case of partitioned indexes, you can achieve parallel functionality by utilizing the DBMS_PCLXUTIL package. It provides intra-partition parallelism for creating partition-wise local indexes. (See Oracle9i Supplied PL/SQL Packages and Types Reference Release 2(9.2) for detailed information on DBMS_PCLXUTIL package.)

I ran the performance tests on the TestData table with the following where clauses:

mod(runId, 2) = 0  ? 48% rows to delete, 52% - to keep (initial test)mod(runId, 3) = 0  ? 32% rows to delete, 68% - to keepmod(runId, 4) = 0  ? 24% rows to delete, 76% - to keepmod(runId, 5) = 0  ? 20% rows to delete, 80% - to keepmod(runId, 8) = 0  ? 12% rows to delete, 88% - to keep

Table 2 shows the performance results for the TestData table with one primary key index and one non-unique index.

Rows to Delete (%)/Rows to Keep (%)In-place Delete (sec.)Swap Delete (sec.)
48/5239.8714.28
24/7624.9619.16
20/8020.1220.88
12/8812.6424.12
Table 2. Performance Results for the TestData Table (one primary key index, one non-unique index)

The results show that the threshold for when an in-place delete becomes more attractive than a swap delete is very close to 20 percent: all delete operations that remove less then 20 percent of rows are better done by with in-place delete, and any delete operations that remove 20 percent rows or higher are faster via a swap delete.

Since rebuilding indexes is the most time-consuming component of the swap delete process, I believe the number of indexes in a table is a mayor factor that affects the threshold?the more indexes a table has, the higher threshold is. To test that theory, I added two more non-unique indexes on the TestData table:

create index TestData_IdNameon TestData(  runId,  objectName);create index TestData_IdDateon TestData(  runId,  createDate,  status  );

Table 3 shows the performance results for the TestData table with one primary key index and three non-unique indexes.

Rows to Delete (%)/Rows to Keep (%)In-place Delete (sec.)Swap Delete (sec.)
48/5264.1834.90
32/6848.9436.91
24/7633.4438.21
20/8027.4639.83
12/8817.6144.72
Table 3. Performance Results for the TestData (one primary key index, three non-unique indexes)

The results show that the threshold is higher for the heavier indexed TestData table. It moved from 20 percent to more than 25 percent.

Indexes are just one mayor factor that affects the threshold value. Other factors that may influence the threshold value include table width/number of columns, CLOB/BLOB columns existence, and partitioned tables.

Keep in mind that the actual threshold value may change from one database server to another?especially if they have different configurations that affect their logging performances. I ran the same tests on another database server with less efficient logging than the first and its threshold value came out lower, around 15 percent.

In-place Delete and Rebuilding of Indexes
In order to make a fair comparison between in-place delete and swap delete, you have to consider one additional step after an in-place delete: rebuilding indexes. In my experience as an Oracle database developer, I’ve seen some cases where massive delete operations severely affected indexes and the only way to resolve the problem and get indexes back in shape was index rebuilds. (I described such a case in the “Amazing Results” section of the 10-Minute Solution “Detecting and Fixing Out-of-Balance Indexes“.) Because of that, a truly fair comparison for massive delete operations would be in-place delete plus index rebuild vs. swap delete. Keep in mind that a swap delete does not need to rebuild indexes because it creates them from scratch during Step 3. That makes swap delete even more attractive and moves the threshold number considerably lower. I found that if you need to rebuild indexes after an in-place delete, then a swap delete outperforms an in-place delete?even if you need to delete just 10 percent of the rows in the table (See Table 4 for these results).

Rows to Delete (%)/Rows to Keep (%)In-place Delete (sec.)Swap Delete (sec.)In-place Delete + Index Rebuild (sec.)
12/8812.6424.1229.18
Table 4. Results for In-place Delete Plus Index Rebuild vs. Swap Delete

Using Parallel Execution with Swap Delete
I mentioned earlier in this article that as part of the swap delete process indexes get rebuilt one after another. You can improve that by rebuilding indexes in parallel using Oracle’s native DBMS_JOB package. (See my 10-Minute Solution “Run Oracle Stored Procedures in Parallel?Inside the Database” for more details.) However, I’ve chosen to use parallel execution per table granularity, executing a swap delete operation for each table in parallel. I had a bad performance problem: one batch process deleted data from the previous runs of 20+ tables?one after another. To illustrate the solution, create four clones of TestData table (TestData1 ? TestData4). To save testing time, each table has a primary key and only one non-unique index.

First, execute the old code scenario: an in-place delete, one table after another. This listing shows the old code:

delete   from TestData1  where mod(runId, 2) = 0;commit;delete   from TestData2  where mod(runId, 2) = 0;commit;delete   from TestData3  where mod(runId, 2) = 0;commit;delete   from TestData4  where mod(runId, 2) = 0;commit;Execution time: 148.67 sec.

For that article, I created a SwapDemo package with just one procedure (Delete) that accepted table name as a parameter. The code listing uses the JobPkg package’s SignalCompletion API (described in “Run Oracle Stored Procedures in Parallel?Inside the Database“).

The following is the SwapDemo package specification:

create or replace package SwapDemo is   procedure Delete   (      pTable varchar2   );end SwapDemo;/

The following is the SwapDemo package body:

create or replace package body SwapDemo is   procedure Delete   (     pTable varchar2   )   is   begin      SwapOps.FullInsert(pTable, 'MOD(RUNID, 2) > 0');      JobsPkg.SignalCompletion('SwapDemo.Delete(''' || pTable || ''')');   end Delete;end SwapDemo;/

Now you are ready to test the new solution, which combines the swap delete technique with the power of parallel execution.

The Proof Is in the Performance
The following listing shows a PL/SQL code block that demonstrates a combination of the swap delete technique and parallel execution.

declare   vJobs JobsPkg.tArrayJob;begin   vJobs(1) := 'SwapDemo.Delete(''TestData1'')';   vJobs(2) := 'SwapDemo.Delete(''TestData2'')';   vJobs(3) := 'SwapDemo.Delete(''TestData3'')';   vJobs(4) := 'SwapDemo.Delete(''TestData4'')';   JobsPkg.Execute(vJobs);         JobsPkg.WaitForCompletion(vJobs);end;/

The above block produced the same data results as the in-place delete code from the “old code” listing, but with 88 percent performance gain. Execution time was down to 18.47 seconds from 148.67 seconds. The real-life original code was more complicated, but switching to the swap delete technique with parallel execution produced a significant performance gain: for 20+ tables with millions of rows and 50 percent of data to delete, the execution time went down from 20+ hours to just 22 minutes. That clearly shows the solution for a high-performance massive delete operation.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist