advertisement
Premier Club Log In/Registration
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   SKILLBUILDING  |   TIP BANK  |   SOURCEBANK  |   FORUMS  |   NEWSLETTERS
Browse DevX
Solutions for Minimizing the Impact of Slow ETL Operations on Your Production Instance
How Data Pump Works
About GET_STATUS
Partners & Affiliates
advertisement
advertisement
Rate this item | 0 users have rated this item.
 Print Print
 
The Oracle 10g Data Pump API Speeds Up the ETL Process
How can you keep slow extract, transform, and load operations from bogging down your production instance? Oracle 10g's Data Pump API offers a great solution. 

advertisement
onsider the following scenario. You have a production system that needs a lot of data from other systems on hand during the day. The data sources that you get this data from return results slowly, and you've got network latency that results in slow extracts. Additionally, a fair amount of transformations have to be done to the data, which also takes time. Thus, it's time-consuming to extract, transform, and load these data from the other systems into your main instance. How can you keep these long, slow extract, transform, and load (ETL) operations from bogging down your production instance?

Of course, if you have the access and authority to push for changes in the source systems, you can speed up the ETL process by speeding up the extract at the source end, and even by providing the data in a format that needs less transformation. But in the real world, you often can't waltz in and demand that another team optimize for delivery to your own project. Long, slow extracts and complex transformations are a fact of life, and it's up to you to minimize the impact on your system.

The best solution—that you don't have to buy any outside products to implement—is creating a separate load instance, running the ETL on this instance, and then shipping the finished, freshly loaded tables to the production instance (see Sidebar 1: Solutions for Minimizing Impact of Slow ETL Operations on Your Production Instance for a few others). This article demonstrates how to implement this solution, which involves three steps:

  1. Set up a separate load instance.
  2. Carry out the ETL on the load instance.
  3. Ship the finished, freshly loaded tables from the load instance to the production instance.
Step 1 is a straightforward DBA matter. Step 2 requires you to know your data sources and your target system. The major implementation decision is: how shall I carry out Step 3? What's the best way to ship the data from the load instance to the production instance?

What You Need
You will need access to two Oracle instances running Oracle 10gR1 or 10gR2, and the CREATE DATABASE LINK privilege on at least one instance.

Oracle Data Pump

Oracle 10g has a new feature that offers a great solution. Oracle Data Pump is Oracle's next-generation version of Export and Import. (For those unfamiliar with these tools, Export extracts data and metadata from an Oracle database into a proprietary-format dump file; Import extracts data and metadata from an Export dump file and recreates the source objects in the target database. See Sidebar 2: How Data Pump Works.)

Oracle Data Pump can be used to move data between databases, to copy databases, or to back them up (although Oracle recommends you use RMAN for database copies and backups, not Data Pump). From a programmer's point of view, there are three big differences between Data Pump and the original Export/Import:

  1. Data Pump is faster. A lot faster.
  2. Data Pump jobs are resumable, and you can attach/detach to them while they run.
  3. Data Pump can be controlled from within the database. You can kick off a Data Pump export or import job from a PL/SQL procedure, or use PL/SQL to monitor the progress of a Data Pump job from within the database.

Data Pump uses two access methods to import and export table data: Direct Path load/unload and External Tables. It chooses the access method behind the scenes on a per-table basis, and the two are mutually compatible—data unloaded with one method can be loaded with the other.

Most programmers know Data Pump through the two command-line utilities (expdp and impdp) that replace the original Export and Import (exp and imp). The expdp and impdp utilities have similar interfaces to exp and imp and can be used for the same tasks. Note, however, that Data Pump dump files are not compatible with original Export and Import, and vice versa. (For a full comparison between exp/imp and expdp/impdp, refer to Chapters 1 and 2 of the Oracle Database Utilities manual—login required.)

Data Pump is also accessible through a PL/SQL API. In fact, expdp and impdp are just front-ends to this API. The command-line utilities transform your command into calls to the Data Pump API, which is implemented as the DBMS_DATAPUMP PL/SQL supplied package, and the DBMS_METADATA package, which handles all operations involving metadata (i.e., database object definitions, such as table structure, storage parameters, etc).

Page 1 of 4
advertisement
  Next Page: The DataPump API
Page 1: IntroductionPage 3: Putting It Together
Page 2: The DataPump APIPage 4: Large Amounts of Data
advertisement
Advertising Info  |   Member Services  |   Permissions  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About


JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
Microsoft Article: HyperV-The Killer Feature in WinServer ‘08
Avaya Article: How to Feed Data into the Avaya Event Processor
Microsoft Article: Install What You Need with Win Server ‘08
HP eBook: Putting the Green into IT
Whitepaper: HP Integrated Citrix XenServer for HP ProLiant Servers
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 1
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 2--The Future of Concurrency
Avaya Article: Setting Up a SIP A/S Development Environment
IBM Article: How Cool Is Your Data Center?
Microsoft Article: Managing Virtual Machines with Microsoft System Center
HP eBook: Storage Networking , Part 1
Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Intel Video: Are Multi-core Processors Here to Stay?
On-Demand Webcast: Five Virtualization Trends to Watch
HP Video: Page Cost Calculator
Intel Video: APIs for Parallel Programming
HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Sun Download: Solaris 8 Migration Assistant
Sybase Download: SQL Anywhere Developer Edition
Red Gate Download: SQL Backup Pro and free DBA Best Practices eBook
Red Gate Download: SQL Compare Pro 6
Iron Speed Designer Application Generator
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
How-to-Article: Preparing for Hyper-Threading Technology and Dual Core Technology
eTouch PDF: Conquering the Tyranny of E-Mail and Word Processors
IBM Article: Collaborating in the High-Performance Workplace
HP Demo: StorageWorks EVA4400
Intel Featured Algorhythm: Intel Threading Building Blocks--The Pipeline Class
Microsoft How-to Article: Get Going with Silverlight and Windows Live
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES