So, the strategy prescribed in this article will be to move data from the load instance to the production instance via
DBMS_DATAPUMP over a database link.
For example, a job to refresh the production table
ALL_SHIP_DATES_PLUS_BILL_HIST would go as follows:
- On the load instance, download the raw data that goes into
- On the load instance, transform the data into the right resultset to go into
- Put this data into a table named
ALL_SHIP_DATES_PLUS_BILL_HIST on the load instance.
[Note: If you were refreshing more than just this one table, you'd carry out steps 1-3 for all the other tables at this point before proceeding.]
- Copy the data from
ALL_SHIP_DATES_PLUS_BILL_HIST on the load instance to
ALL_SHIP_DATES_PLUS_BILL_HIST on the production instance, using
DBMS_DATAPUMP with a database link.
At this point, you may be wondering, 'Why not just use SQL to move the data over the database link?' For example, if you were copying the contents of
ALL_SHIP_DATES_PLUS_BILL_HIST from instance A to B, you could log on to instance B and issue these commands:
delete from ALL_SHIP_DATES_PLUS_BILL_HIST;
insert into ALL_SHIP_DATES_PLUS_BILL_HIST (select * from ALL_SHIP_DATES_PLUS_BILL_HIST@instanceA.world );
If you're moving more than a handful of tables' worth of data,
DBMS_DATAPUMP is superior to the above approach for a number of reasons:
- Ease of use and manageability: You don't need to specify the name of each table individually. You can supply this information by supplying a pattern, a schema, a tablespace, etc. This improves manageability as well as ease of programming: as new tables are added, make sure the table names fit the pattern or are in the appropriate schema or tablespace, and they will automatically be added to the Data Pump import.
DBMS_DATAPUMP is optimized for batch performance, and it is self-tuning.
- Indexes: You can bring over indexes as well as tables, eliminating the need to rebuild indexes after refreshing tables.
DBMS_DATAPUMP exports and loads are interruptible and resumable. As mentioned in Sidebar 2: How Data Pump Works, information about each export or load is stored in a database table, so a stopped DataPump process can easily be resumed where it left off.
- Logging, error handling, and troubleshooting: These capabilities are already extensively supported in
DBMS_DATAPUMP. You'll have to write this code yourself if you implement your own load using straight selects over database links.
- Ease of monitoring: You can attach to the running Data Pump import at any time and get its status, percent completion, any errors, etc.
Creating a Job
Data Pump imports and exports are called jobs, even though they have nothing to do with the Oracle Job Queue. The word job captures the batch, interruptible nature of a Data Pump import or export. Be aware, though, that DP jobs are not run by job queue processes, they are not created or handled via Oracle's job handling API, and they won't show up in DBA_JOBS or other Job Queue views.
The following three basic steps kick off a Data Pump job via the API, be it Import or Export:
- Create the job.
- Define parameters for the job.
- Start the job.
Once the job has started, you can optionally use the API to:
- Monitor the job until it completes;
- Detach from the job and reattach later;
- Stop the job; and
- Restart a previously stopped job.
Step 1: Create the Job
dbms_datapump.open function to create the job. This function returns a number that serves as the job handle. You'll use the job handle in all subsequent operations to refer to the job. (If you lose the job handle, you can get it again by calling
When you create the job, you need to provide the following basic information:
- Whether it's an import or export job
- What the job mode will be (The modes are like those in original Export/Import, e.g.,
TABLE mode vs.
SCHEMA mode vs.
FULL mode vs. ...)
- The name of the database link you'll be using, or
NULL if none
You can optionally supply a job name, the version of the objects, and whether or not to compress metadata (see the documentation for full details). If you don't supply a job name, your job is given a system-generated name.
The following example declares a table-mode import:
my_handle := dbms_datapump.open (operation => 'IMPORT',job_mode => 'TABLE', remote_link =>
my_db_link, job_name => my_job_name, version=>'LATEST' ) ;
dbms_output.put_line ('Opened job ' || my_job_name || ' with handle: ' || my_handle);
. . .
Opened job IMP_PS2_TS with handle: 53
Note that this handle is invalid out of context (i.e., if you get a handle, stop execution, and later try to plug the handle that's showing on your screen into a new block, it won't work). To attach to this job in a different context, you need to get a fresh handle using
You can see what you've done so far by querying
DBA_DATAPUMP_JOBS. Your job should show up with the name you've assigned it and the status
You'll also be able to see the master table defined for this job. The master table is in your schema, and it has the same name as the job. (Consequently, you cannot create a job with the same name as an existing schema object.)
Step 2: Define Parameters for the Job
To define parameters for the job, you must do two main things:
- Set any optional parameters.
- Name the objects in the job.
You set optional parameters by calling
dbms_datapump.set_parameter. A particularly useful parameter for this task is the one that controls Data Pump Import's behavior when it tries to import a table that already exists. You can choose from
SKIP. Since you're pulling pre-formatted data from the load instance into an existing table on the production instance, choose
The advantage of
TRUNCATE is that it preserves grants, and you can set up
DBMS_DATAPUMP to re-create those grants (although that can lead to messy cascades of invalidated procedures). The advantage of
REPLACE is that you can easily change table structures by changing them on the load instance only, and then letting the night's data pump job propagate the changes to production. That sure beats manually maintaining identical table definitions on both instances.
set_parameter procedure is very easy to use, as this command shows:
dbms_datapump.set_parameter (my_handle, name => 'TABLE_EXISTS_ACTION', value => 'TRUNCATE');
To demonstrate the second part of defining parameters, naming the objects that will be included in this job, pick the names of the tables that you want included from the above table-level import. You do this using the
dbms_datapump.metadata_filter procedure. You tell it what type of objects you're naming using the
object_type parameter, and then provide either an expression that the object names must match to be included in your dataset or a list of object names to include. This command retrieves just one table:
dbms_datapump.metadata_filter (handle => my_handle, name => 'NAME_LIST', value =>
'(''ALL_SHIP_DATES_PLUS_BILL_HIST'')', object_type => 'TABLE');
Step 3: Kick Off the Job
You start the job by calling dbms_datapump.start_job. Again, this is straightforward:
dbms_output.put_line ('Started job ' || my_job_name || ' with handle: ' || my_handle);