devxlogo

Optimize the Massive DELETE Operation in Oracle, Part 1

Optimize the Massive DELETE Operation in Oracle, Part 1

massive DELETE operation deletes millions of rows from a table with indexes and constraints. This operation is database intensive and time consuming, mainly because it forces the database to generate and save to disk significant amounts (possibly gigabytes) of redo and undo data.

You can perform massive DELETEs as bulk INSERT operations: instead of removing data you no longer need, you insert data you want to keep. The key to this option is performing it efficiently with minimum logging by using direct-path INSERT. Part 1 of this article series compares direct-path INSERT and regular INSERT. It also describes the technique of performing DELETEs as INSERTs. Part 2 will discuss the packaged APIs that implement the bulk INSERT technique.



How do I complete a massive DELETE operation in my Oracle database without having to pay the heavy performance overhead?



Perform the massive DELETE operation as a direct-path (direct-load) INSERT (INSERT with APPEND hint) into a table whose logging parameter is set to NOLOGGING. This operation will complete significantly faster than DELETE and with minimum logging, but you have to take a backup afterwards to establish a new baseline.

Why INSERT Could Be Faster Than DELETE (or UPDATE)
Direct-path INSERT is a special database operation. Like SQL*Loader, it writes data directly to the database files, bypassing the buffer cache. It does this with minimum logging, recording only data dictionary changes. The logic behind this method is that because the data files are already up-to-date when an instance failure occurs, logging isn’t necessary.

The two distinct cases in which direct-path INSERT is important are when:

  1. The database is in noarchivelog mode. Media recovery is not possible, and you don’t need redo data for that either.
  2. The database is in archivelog mode. It logs redo blocks for media recovery by default. However, if you explicitly set a table to NOLOGGING mode, the database won’t log redo blocks.

Therefore, with the direct-path INSERT when the database is in noarchivelog mode or when it is in archivelog mode and the table is in NOLOGGING mode, it performs only minimum redo logging—to protect the data dictionary.

Need Proof? Generate Test Data
To demonstrate the difference between direct-path and regular INSERTs, create test data. First, create a table named TestData with half a million rows, a primary key, one non-unique index, and a check constraint. You will use this table in all of the article’s code and performance tests. It employs data from the all_objects view as its source.

See also  How to Find and Hire the Right Database Developer for Your Project

First, create the table with the following code:

drop table TestData;create table TestData(   objectId     integer      not null,   objectName   varchar2(30) not null,   objectType   varchar2(30) not null,   runId        integer      not null,   createDate   date                 ,   status       varchar2(7)          ,   description  varchar2(255))nologging;

Next, populate it with data using the following block (Since the all_objects view (Oracle 9.2) has 20K+ rows, you need to do the insert 25 times to total 500K+ rows in the TestData table):

declare  vCount pls_integer := 25;begin   for nIndex in 1..25   loop      insert /*+ APPEND */      into TestData      (         objectId,         objectName,         objectType,         runId,         createDate,         status,         description      )      select object_id,             object_name,             object_type,             nIndex,             created,             status,             object_name        from all_objects;              commit;   end loop;end;/

Check the rows count:

select count(*)   from TestData;  COUNT(*)----------    547875

Add the primary key on the objectId and runId columns, one non-unique index on the objectName and objectType columns, and one check constraint on the runId:

alter table TestDataadd constraint TestData_PKprimary key(  objectId,  runId);create index TestData_NameTypeon TestData(  objectName,  objectType);alter table TestDataadd constraint TestData_CKcheck (runId > 0);

Gather the statistics using the following block:

begin   dbms_stats.gather_table_stats   (      ownname    => user,      tabname    => 'TestData',      method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',      degree     => 4,      cascade    => true   );end;/

Now, create two tables, TestData_Logging and TestData2_Nologging, with identical structures to TestData but with no indexes or constraints:

create table TestData_Logging asselect *   from TestData where 1 = 0; create table TestData_Nologging nologging asselect *   from TestData where 1 = 0;

TestData_Logging is in LOGGING mode, while TestData_Nologging is in NOLOGGING mode:

select table_name,       logging  from user_tables where table_name like 'TESTDATA\_%' escape '';TABLE_NAME                     LOGGING------------------------------ -------TESTDATA_LOGGING               YESTESTDATA_NOLOGGING             NO

Take a snapshot of the redo size before doing a regular INSERT:

select a.name,       b.value  from v$statname a,       v$mystat b where a.statistic# = b.statistic#   and a.name = 'redo size';NAME              VALUE----------------- -----------redo size            39867216

Perform the INSERT into the TestData_Logging table (Scenario 1):

set timing oninsert into TestData_Loggingselect *  from TestData;547875 rows created.Elapsed: 00:00:08.28

Take a snapshot of redo and undo after the INSERT:

select a.name,       b.value  from v$statname a,       v$mystat b where a.statistic# = b.statistic#   and a.name = 'redo size';NAME                    VALUE----------------- -----------redo size            93019404select used_ublk   from v$transaction;USED_UBLK----------       238commit;

Populating the TestData_Logging table with 547,875 rows in LOGGING mode took 8.28 seconds and forced the database to generate 50.69MB (93,019,404 – 39,867,216 = 53,152,188) of redo and 238 undo blocks.

Now, repeat the same steps for the table in NOLOGGING mode (Scenario 2):

  1. Obtain performance results before load:
    select a.name,       b.value  from v$statname a,       v$mystat b where a.statistic# = b.statistic#   and a.name = 'redo size';NAME                    VALUE----------------- -----------redo size            93019496insert /*+ APPEND */into TestData_Nologgingselect *  from TestData;547875 rows created.Elapsed: 00:00:05.39
  2. Obtain performance results after load:
    select a.name,       b.value  from v$statname a,       v$mystat b where a.statistic# = b.statistic#   and a.name = 'redo size';NAME                    VALUE----------------- -----------redo size            93019404select used_ublk   from v$transaction;USED_UBLK----------       1

Populating the TestData_Nologging table with 547,875 rows in NOLOGGING mode took 5.39 seconds and generated only 0.01MB (!) (93,120,212 – 93,019,496= 100,716) of redo and 1 undo block.

See also  Integration Salesforce with SFTP

Table 1 shows the performance numbers for both scenarios.

Elapsed Time (sec.)Redo Size (MB)Undo Blocks
Scenario 1: Regular INSERT, LOGGING mode8.2850.69238
Scenario 2: Direct-path INSERT, NOLOGGING mode5.390.011
Table 1. Performance Results for Scenarios 1 and 2

However, if you add just a primary key constraint to the TestData_Nologging table (Scenario 3), the redo size skyrockets even though both the table and the primary key index are in NOLOGGING mode:

drop table TestData_Nologging;create table TestData_Nologging nologging asselect *   from TestData where 1 = 0; alter table TestData_Nologgingadd constraint TestData_Nologging_PKprimary key(  objectId,  runId);alter index TestData_Nologging_PK nologging;

Obtain performance results before load:

select a.name,       b.value  from v$statname a,       v$mystat b where a.statistic# = b.statistic#   and a.name = 'redo size';NAME                    VALUE----------------- -----------redo size            93198748insert /*+ APPEND */into TestData_Nologgingselect *  from TestData;547875 rows created.Elapsed: 00:00:15.61

Obtain performance results after load:

select a.name,       b.value  from v$statname a,       v$mystat b where a.statistic# = b.statistic#   and a.name = 'redo size';NAME                    VALUE----------------- -----------redo size            131033428select used_ublk   from v$transaction;USED_UBLK----------       817commit;

As you can see, adding index led to an elapsed time increase from 5.39 to 15.61 seconds, a redo size increase from 0.01MB to 36MB (131,033,428 – 93,198,748 = 37,834,680), and an undo blocks increase from 1 to 817 (!). The reason is the database did heavy logging for the primary key index. Putting a primary key index in NOLOGGING mode did not help because NOLOGGING applies only to a limited number of operations. Here is a list of such operations from Oracle9i SQL Reference Release 2 (9.2):

DML:
  • Direct-path INSERT (serial or parallel)
  • Direct Loader (SQL*Loader)

DDL:

  • CREATE TABLE ... AS SELECT
  • ALTER TABLE ... MOVE
  • ALTER TABLE ... [all partition operations that involve data movement]
    • ALTER TABLE ... ADD PARTITION (hash partition only)
    • ALTER TABLE ... MERGE PARTITIONS
    • ALTER TABLE ... SPLIT PARTITION
    • ALTER TABLE ... MOVE PARTITION
    • ALTER TABLE ... MODIFY PARTITION ... ADD SUBPARTITION
    • ALTER TABLE ... MODIFY PARTITION ... COALESCE SUBPARTITION
    • ALTER TABLE ... MODIFY PARTITION ... REBUILD UNUSABLE INDEXES
  • CREATE INDEX
  • ALTER INDEX ... REBUILD
  • ALTER INDEX ... REBUILD PARTITION
  • ALTER INDEX ... SPLIT PARTITION

See Oracle9i SQL Reference Release 2 (9.2) at for more details on NOLOGGING clause (login required).

The above list shows that you can take advantage of the CREATE INDEX operation by rearranging your steps:

  1. Load the data into a table with no indexes.
  2. Create indexes in NOLOGGING PARALLEL mode.
  3. Create constraints.

The following series of commands for the TestData_Nologging table implements the above technique (Scenario 4):

drop table TestData_Nologging;create table TestData_Nologging nologging asselect *   from TestData where 1 = 0; insert /*+ APPEND */into TestData_Nologgingselect *  from TestData;  create unique index TestData_Nologging_PKon TestData_Nologging(  objectId,  runId)nologging parallel;alter table TestData_Nologgingadd constraint TestData_Nologging_PKprimary key(  objectId,  runId);

Scenario 4 turns out to be more efficient than Scenario 3: redo size went down from 36.08MB to 0.17 and undo blocks from 817 to just 2.

Table 2 shows the performance numbers for both scenarios 3 and 4.

Elapsed Time (sec.)Redo Size (MB)Undo Blocks
Scenario 3: Direct-path INSERT into table with existing primary key15.6136.08817
Scenario 4: Direct-path INSERT into table with no primary key, creating primary index in NOLOGGING PARALLEL mode, then adding primary key12.340.172
Table 2. Performance Results for Scenarios 3 and 4

The numbers clearly show that Scenario 4 is the way to go. Now it’s time to talk about implementing DELETEs as bulk INSERTs.

Performing Massive DELETEs as Bulk INSERT
The following steps implement the technique of implementing DELETEs as direct-path INSERTs:

  1. Create a table (TestData_X) with the identical structure as the TestData table in the previous section but with no primary key, alternate key(s), or foreign key(s) constraints (NOLOGGING option).
  2. Insert data that you need to be keep into the newly created table using direct-path INSERT.
  3. Create indexes on the new table with NOLOGGING PARALLEL options.
  4. Create constraints on the new table. Use the ENABLE NOVALIDATE option for referential integrity and check constraints.
  5. Swap the TestData and TestData_X tables: rename TestData to TestData_T, rename TestData_X to TestData.
  6. Restore the original logging settings.
  7. Verify the results and drop the TestData_T table.

Optimize the Massive DELETE Operation in Oracle, Part 2 will present the SwapOps package, which contains all of the APIs necessary to implement the above steps. It also will present performance tests for DELETEs and direct-load INSERT, and compare the two.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist