dcsimg
Login | Register   
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX

By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.


advertisement
 

Cloning Data Accurately and Randomly-3 : Page 3


advertisement

WEBINAR:

On-Demand

Application Security Testing: An Integral Part of DevOps


Creating a Custom Package
In case you're thinking of extracting sample data with Oracle's SAMPLE clause, don't bother. I've only had negative experiences using it. (Read the sidebar.)

Caveat for Extracting Sample Data

 
Oracle's SAMPLE keyword allows you to extract a sample block of data from a table. Read on to find out why you needn't bother...

Instead, you're best off creating a custom package, as I did. ParticipantData uses the previously created Rand package (see Listing 1) to extract sample data accurately and randomly.



Here is the algorithm:

  1. Calculate the number of rows in the table (T) for criteria specified in the WHERE clause.
  2. Calculate the number of rows you need to retrieve (N) based on the percentage (P) passed to the procedure as a parameter using the following formula:

    N = (T * P) / 100

  3. Create a SELECT statement that dynamically assigns a random number to each row, sorts it by this number, and extracts the top N rows.
The key to this plan is the SELECT statement. Here is an example of extracting 500 rows randomly for the period number 43:

SELECT ParticipantSeq FROM (SELECT P.*, Rand.RandomNum AS MyNum FROM Participant P WHERE StartPeriod = 43 ORDER BY MyNum) WHERE ROWNUM <= 500;

To make this work, I created two procedures, ModifyRandomByPeriod and ModifyRandomPeriodList:
  • Procedure ModifyRandomByPeriod does all the work per period, specified by the p_StartPeriod parameter, by extracting a randomly specified percentage of the data and modifying the values in one of the fields of the Participant table.

  • Procedure ModifyRandomPeriodList accepts the list of comma-delimited periods, or 'ALL,' for the whole table as a parameter to extract and modify the data across multiple periods. It parses the list, creates an array (a PL/SQL table) of periods, and calls ModifyRandomByPeriod for each of them in a loop.
Listing 2 shows the code from the ParticipantData package that's relevant to your task. In my case, there are 15,568 records in my Participant table for the "January 1999" period. What happens when I use the ModifyRandomByPeriod procedure and run it five times, setting the parameter p_Percentage to 10 percent?

The data in csr_RandomParticipants cursor, and its order, is completely random after each run. Moreover, the number of rows extracted (and later cloned) are 100 percent accurate:

1557
1557
1557
1557
1557
My task is accomplished. So is yours.

As you can see, the technique described in this 10-Minute Solution is pretty generic and can be used to extract random data sample from any table or set of related tables.



Boris Milrud has 10 years of experience as a software developer. He is a Senior Database Engineer with Callidus Software, Inc. located in San Jose CA. He is specializing in all aspects of Oracle database software development including database design, programming, optimization and tuning.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
×
We have made updates to our Privacy Policy to reflect the implementation of the General Data Protection Regulation.
Thanks for your registration, follow us on our social networks to keep up-to-date