advertisement
Login | Register   
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   TIP BANK
Browse DevX
Partners & Affiliates
advertisement
advertisement
advertisement
advertisement
 

Generating Realistic and Non-Random Sample Data

Generating realistic sample data is often harder than you might think, and generating sample data that you can use for a successful data-mining demonstration requires even more planning. This article explores the process and presents guidelines for realistic sample data generation. 


advertisement
ave you ever needed to populate a database with realistic, but generated (as opposed to actual) data, in volumes that demand automation? Such a need arises in many circumstances, such as:


  • Stress-testing on new applications where historical data does not yet exist.
  • Data Scrubbing, where historical, representative data is available but the identity of customers and other sensitive data must be obscured for some reason, such as a demonstration open to the public.
  • Training and Documentation, where representative data would improve comprehension.
  • Sales and Marketing Proof-of-Concept, where realistic data, especially if tailored to a prospect's industry, would be much more compelling than the usual one-size-fits-all.
Sometimes, you need data that is not only realistic, but also "tells a story," This is typical in a business intelligence (BI) or data mining context where you want to superimpose pre-determined patterns on the data in the data warehouse or datamart, so that the patterns can then be "discovered" in the course of the exercise. After all, pattern discovery is one of the main reasons for applying these technologies.

It's quick, easy and you get access to all the articles on DevX.
This registration/login is to allow you to read articles on devx.com.
Already a member?



advertisement