y previous 10-Minute Solution “Optimize the Massive DELETE Operation in Oracle, Part 1” described the technique for performing a high-performance massive DELETE operation as a direct-path INSERT. At the end of the article, I outlined six steps for implementing this technique:
- Step 1
Create a table (TestData_X) with identical structure to an existing table (TestData) but without primary key, alternate key(s), and foreign key(s) constraints. Create it with the NOLOGGING option.
- Step 2
Insert the data you need to keep into the newly created table using a direct-path INSERT.
- Step 3
Create indexes on the new table with NOLOGGING PARALLEL options.
- Step 4
Create constraints on the new table. Use the ENABLE NOVALIDATE option for referential integrity and check constraints.
- Step 5
Swap the TestData and TestData_X tables: rename TestData to TestData_T, and rename TestData_X to TestData.
- Step 6
Restore the original logging settings. Verify the results and drop the TestData_T table.
As you can see, the steps are rather general. You could automate them by creating a set of APIs that accepts the table name as a parameter. I developed a SwapOps package that delivers the functionality of the above six steps.
How do I complete a massive DELETE operation in my Oracle database without having to pay the heavy performance overhead?
Perform the massive DELETE operation as a swap delete with parallel execution. This technique can offer significant performance gains over a straight or in-place delete operation.
The SwapOps Package
The SwapOps package code extracts metadata from database objects, such as tables, indexes, constraints, etc. A previous 10-Minute Solution, “Use an Oracle API to Extract and Store Database Objects’ DDL“, described a MetaDataPkg package that allowed you to automate the process of extracting objects’ definitions and storing them in memory structures (associative arrays). Since then, I made a small modification to that package by adding one more attribute (aSwapName) to type tMetaObject:
type tMetaObject is record( aName tString, aType tString, aLogging tString, aParallel tString, aStatus tString, aValidated tString, aRely tString, aSwapName tString, aDDLString tLongString);
The aSwapName attribute holds the name for a swap object (e.g., for the TestData table, the aSwapName value will be TestData_T, and for the TestData primary key TestData_PK, the value will be TestData_PK_T, etc.).
The accompanying source codecontains the MetaDataPkg package’s specification and body.
Now, let’s look at the SwapOps package’s specification:
create or replace package SwapOps as /* * Constants */ cSwapSuffix constant char(2) := '_X'; cTempSuffix constant char(2) := '_T'; /* * GetMeta */ function GetMeta ( pName in MetaDataPkg.tString, pForce in boolean := false ) return MetaDataPkg.tFullMetaObject; /* * CreateObjects */ procedure CreateObjects ( pName in MetaDataPkg.tString, pType in MetaDataPkg.tString, pLogging in MetaDataPkg.tString := MetaDataPkg.cNo ); /* * RenameObjects */ procedure RenameObjects ( pName in MetaDataPkg.tString, pType in MetaDataPkg.tString, pFromSuffix in MetaDataPkg.tString, pToSuffix in MetaDataPkg.tString ); /* * ExecuteInsert */ procedure ExecuteInsert ( pName in MetaDataPkg.tString, pWhere in MetaDataPkg.tString := null ); /* * ExecuteSQL */ procedure ExecuteSQL ( pSQLString in MetaDataPkg.tLongString ); /* * SwapObjects */ procedure SwapObjects ( pName in MetaDataPkg.tString, pDrop in MetaDataPkg.tString := MetaDataPkg.cNo ); /* * RestoreSettings */ procedure RestoreSettings ( pName in MetaDataPkg.tString, pLogging in MetaDataPkg.tString := MetaDataPkg.cNo, pIndexLogging in MetaDataPkg.tString := MetaDataPkg.cNo ); /* * FullInsert */ procedure FullInsert ( pName in MetaDataPkg.tString, pWhere in MetaDataPkg.tString := null, pLogging in MetaDataPkg.tString := MetaDataPkg.cNo, pIndexLogging in MetaDataPkg.tString := MetaDataPkg.cNo, pDrop in MetaDataPkg.tString := MetaDataPkg.cNo ); /* * FullSQL */ procedure FullSQL ( pName in MetaDataPkg.tString, pSQLString in MetaDataPkg.tLongString, pLogging in MetaDataPkg.tString := MetaDataPkg.cNo, pIndexLogging in MetaDataPkg.tString := MetaDataPkg.cNo, pDrop in MetaDataPkg.tString := MetaDataPkg.cNo ); /* * DropObject */ procedure DropObject ( pObjectName in MetaDataPkg.tString, pObjectType in MetaDataPkg.tString, pTableClause in MetaDataPkg.tString := null ); /* * LockTable */ procedure LockTable ( pTableName in MetaDataPkg.tString, pMode in MetaDataPkg.tString := 'EXCLUSIVE' ); function GetSwapName ( pName in MetaDataPkg.tString, pSuffix in MetaDataPkg.tString := cSwapSuffix ) return MetaDataPkg.tString; function GetRenameString ( pFromName in MetaDataPkg.tString, pToName in MetaDataPkg.tString, pType in MetaDataPkg.tString, pTable in MetaDataPkg.tString := null ) return MetaDataPkg.tDBString;end SwapOps;/
At the top, I defined two constants: cSwapSuffix (‘_T’) and cTempSuffix (‘_X’). That’s where you could modify the package if you wanted to use you own scheme for naming swap objects and temp objects.
I used the following procedures to automate the six steps I outlined at the beginning. Their names are self-explanatory:
CreateObjects()procedure for Steps 1, 3, and 4
ExecuteSQL()procedures for Step 2
SwapObjects()procedure for Step 5
RestoreSettings()procedure for Step 6
The following is the difference between the
ExecuteInsert()accepts a string for the WHERE clause of an INSERT statement, constructs that statement, and then executes it.
ExecuteSQL()is more general. It accepts any full SQL string, and you can use it to execute complex INSERT statements or other non-INSERT SQL statements such as UPDATE.
The SwapOps package also contains two wrapper procedures:
FullSQL(). These procedures call the above procedures?
RestoreSettings()?in the right order.
A SwapOps Demonstration
This section shows an example of how to use the SwapOps package. In “Optimize the Massive DELETE Operation in Oracle, Part 1” I created a table (TestData) with 500K+ rows that was logically partitioned by values in a runId column. If you run the following query:
select runId, count(*) from TestData group by runId;
You’ll get the following output:
RUNID COUNT(*)---------- ---------- 1 20496 2 20496 3 20496 4 20496 5 20496 6 20496 7 20496 8 20496 9 20496 10 20496 11 20496 12 20496 13 20496 14 20496 15 20496 16 20496 17 20496 18 20496 19 20496 20 20496 21 20496 22 20496 23 20496 24 20496 25 20496
Delete the data produced by the first 20 runs?rows with runId values between 1 and 20. The code for this is very simple: just one call of the
FullInsert()procedure with two parameters, tableName (‘TestData’), and a WHERE clause string for rows you want to keep (‘runId > 20’):
begin SwapOps.FullInsert('TestData', 'runId > 20');end;/
Now, run the first query again:
select runId, count(*) from TestData group by runId;
You’ll get this output:
RUNID COUNT(*)---------- ---------- 21 20496 22 20496 23 20496 24 20496 25 20496
As you can see, the data from the first 20 runs is gone. The default value for the pDrop parameter
FullInsert()procedure is ‘no’. That means the original table you renamed TestData_T was not dropped. For safety reasons, you may want to verify the results first and then drop it.
Now you can compare the performance of a straight or in-place delete operation to that of a swap delete.
In-place Delete vs. Swap Delete Performance
The first test deletes the results from runs with even runIDs (2, 4, 6, …, 24), which will be roughly 50 percent of the data in the TestData table:
select sum(case when mod(runId, 2) = 0 then 1 else 0 end) as To_Delete, sum(case when mod(runId, 2) > 0 then 1 else 0 end) as To_Keep from TestData; TO_DELETE TO_KEEP---------- ---------- 245952 266448
Run an in-place delete first:
delete from TestData where mod(runId, 2) = 0;245952 rows deleted.Elapsed: 00:00:39.87
Next, run a swap delete:
begin SwapOps.FullInsert('TestData', 'mod(runId, 2) > 0');end;/Elapsed: 00:00:14.28
The in-place delete took roughly 41 seconds, while the swap delete took slightly more than 14 seconds!
The TestData table has one primary key index and one non-unique index. I added one more non-unique index, then another non-unique index to test the index factor. The numbers in Table 1 show that the swap delete kept outperforming the in-place delete.
|In-place Delete (sec.)||Swap Delete (sec.)|
|Table with one primary key index and one non-unique index||39.87||14.28|
|Table with one primary key index and two non-unique indexes||52.39||24.12|
|Table with one primary key index and three non-unique indexes||64.18||34.90|
|Table 1. Results of Swap Delete vs. In-place Delete Performance Test|
Is In-place Delete Better Than Swap Delete?
Deleting almost half of the data (48 percent) obviously is much faster via a swap delete, but what about when the percentage of data you want to delete goes down (40, 25, 10, etc.)? At which point does an in-place delete become more attractive than a swap delete? Well, the less data you need to delete and the more data you need to insert, the more work you’ll have to do to rebuild indexes. For example, deleting 10 percent of the data via a swap delete means that 90 percent will be inserted in a new table and indexes will be rebuilt on 90 percent of the original data. Rebuilding indexes is a time-consuming operation. Believe or not, it’s the biggest time component of the swap delete process: during its 15-second total in Table 1, 5 seconds went to the actual insert and 9 seconds were spent on index creation.
The current code in the SwapOps package rebuilds indexes serially, one after another. One way to speed up that step is to rebuild the indexes in parallel using Oracle’s DBMS_JOB package, which I discuss it later in the article. (See Oracle9i Supplied PL/SQL Packages and Types Reference Release 2(9.2) for detailed information on DBMS_JOB package.)
In the case of partitioned indexes, you can achieve parallel functionality by utilizing the DBMS_PCLXUTIL package. It provides intra-partition parallelism for creating partition-wise local indexes. (See Oracle9i Supplied PL/SQL Packages and Types Reference Release 2(9.2) for detailed information on DBMS_PCLXUTIL package.)
I ran the performance tests on the TestData table with the following where clauses:
mod(runId, 2) = 0? 48% rows to delete, 52% - to keep (initial test)
mod(runId, 3) = 0? 32% rows to delete, 68% - to keep
mod(runId, 4) = 0? 24% rows to delete, 76% - to keep
mod(runId, 5) = 0? 20% rows to delete, 80% - to keep
mod(runId, 8) = 0? 12% rows to delete, 88% - to keep
Table 2 shows the performance results for the TestData table with one primary key index and one non-unique index.
|Rows to Delete (%)/Rows to Keep (%)||In-place Delete (sec.)||Swap Delete (sec.)|
|Table 2. Performance Results for the TestData Table (one primary key index, one non-unique index)|
The results show that the threshold for when an in-place delete becomes more attractive than a swap delete is very close to 20 percent: all delete operations that remove less then 20 percent of rows are better done by with in-place delete, and any delete operations that remove 20 percent rows or higher are faster via a swap delete.
Since rebuilding indexes is the most time-consuming component of the swap delete process, I believe the number of indexes in a table is a mayor factor that affects the threshold?the more indexes a table has, the higher threshold is. To test that theory, I added two more non-unique indexes on the TestData table:
create index TestData_IdNameon TestData( runId, objectName);create index TestData_IdDateon TestData( runId, createDate, status );
Table 3 shows the performance results for the TestData table with one primary key index and three non-unique indexes.
|Rows to Delete (%)/Rows to Keep (%)||In-place Delete (sec.)||Swap Delete (sec.)|
|Table 3. Performance Results for the TestData (one primary key index, three non-unique indexes)|
The results show that the threshold is higher for the heavier indexed TestData table. It moved from 20 percent to more than 25 percent.
Indexes are just one mayor factor that affects the threshold value. Other factors that may influence the threshold value include table width/number of columns, CLOB/BLOB columns existence, and partitioned tables.
Keep in mind that the actual threshold value may change from one database server to another?especially if they have different configurations that affect their logging performances. I ran the same tests on another database server with less efficient logging than the first and its threshold value came out lower, around 15 percent.
In-place Delete and Rebuilding of Indexes
In order to make a fair comparison between in-place delete and swap delete, you have to consider one additional step after an in-place delete: rebuilding indexes. In my experience as an Oracle database developer, I’ve seen some cases where massive delete operations severely affected indexes and the only way to resolve the problem and get indexes back in shape was index rebuilds. (I described such a case in the “Amazing Results” section of the 10-Minute Solution “Detecting and Fixing Out-of-Balance Indexes“.) Because of that, a truly fair comparison for massive delete operations would be in-place delete plus index rebuild vs. swap delete. Keep in mind that a swap delete does not need to rebuild indexes because it creates them from scratch during Step 3. That makes swap delete even more attractive and moves the threshold number considerably lower. I found that if you need to rebuild indexes after an in-place delete, then a swap delete outperforms an in-place delete?even if you need to delete just 10 percent of the rows in the table (See Table 4 for these results).
|Rows to Delete (%)/Rows to Keep (%)||In-place Delete (sec.)||Swap Delete (sec.)||In-place Delete + Index Rebuild (sec.)|
|Table 4. Results for In-place Delete Plus Index Rebuild vs. Swap Delete|
Using Parallel Execution with Swap Delete
I mentioned earlier in this article that as part of the swap delete process indexes get rebuilt one after another. You can improve that by rebuilding indexes in parallel using Oracle’s native DBMS_JOB package. (See my 10-Minute Solution “Run Oracle Stored Procedures in Parallel?Inside the Database” for more details.) However, I’ve chosen to use parallel execution per table granularity, executing a swap delete operation for each table in parallel. I had a bad performance problem: one batch process deleted data from the previous runs of 20+ tables?one after another. To illustrate the solution, create four clones of TestData table (TestData1 ? TestData4). To save testing time, each table has a primary key and only one non-unique index.
First, execute the old code scenario: an in-place delete, one table after another. This listing shows the old code:
delete from TestData1 where mod(runId, 2) = 0;commit;delete from TestData2 where mod(runId, 2) = 0;commit;delete from TestData3 where mod(runId, 2) = 0;commit;delete from TestData4 where mod(runId, 2) = 0;commit;Execution time: 148.67 sec.
For that article, I created a SwapDemo package with just one procedure (Delete) that accepted table name as a parameter. The code listing uses the JobPkg package’s SignalCompletion API (described in “Run Oracle Stored Procedures in Parallel?Inside the Database“).
The following is the SwapDemo package specification:
create or replace package SwapDemo is procedure Delete ( pTable varchar2 );end SwapDemo;/
The following is the SwapDemo package body:
create or replace package body SwapDemo is procedure Delete ( pTable varchar2 ) is begin SwapOps.FullInsert(pTable, 'MOD(RUNID, 2) > 0'); JobsPkg.SignalCompletion('SwapDemo.Delete(''' || pTable || ''')'); end Delete;end SwapDemo;/
Now you are ready to test the new solution, which combines the swap delete technique with the power of parallel execution.
The Proof Is in the Performance
The following listing shows a PL/SQL code block that demonstrates a combination of the swap delete technique and parallel execution.
declare vJobs JobsPkg.tArrayJob;begin vJobs(1) := 'SwapDemo.Delete(''TestData1'')'; vJobs(2) := 'SwapDemo.Delete(''TestData2'')'; vJobs(3) := 'SwapDemo.Delete(''TestData3'')'; vJobs(4) := 'SwapDemo.Delete(''TestData4'')'; JobsPkg.Execute(vJobs); JobsPkg.WaitForCompletion(vJobs);end;/
The above block produced the same data results as the in-place delete code from the “old code” listing, but with 88 percent performance gain. Execution time was down to 18.47 seconds from 148.67 seconds. The real-life original code was more complicated, but switching to the swap delete technique with parallel execution produced a significant performance gain: for 20+ tables with millions of rows and 50 percent of data to delete, the execution time went down from 20+ hours to just 22 minutes. That clearly shows the solution for a high-performance massive delete operation.