Question:
Do the table partitioning features (fragmenting) in XPS solve the problem of unloading old data in a data warehouse?Specifically, I need to delete up to 400MB of old data from one table and insert 400MB of new data. Can I avoid locking and logging overhead by dropping a fragment that contains data belonging to a particular date range? Is this more or less instantaneous, or does updating indexes still represent major overhead?
Answer:
Your question is not specifically XPS-oriented; the same principles apply to tables created in OnLine Dynamic Server instances. To start, let’s briefly review the options available to fragment tables.
Informix allows you to use two different methods of fragmentation: “by expression” and “round robin.” You can use these methods to fragment indexes as well as tables, although fragmenting an index using round robin fragmentation would be stupid to say the least. Each fragmentation method has its own strengths and weaknesses, which will make one method for fragmenting a table more appropriate than the other depending on the situation.
With round robin fragmentation, data is distributed serially across all of the table’s fragments. This method is equivalent to RAID level 0. The advantage (as well as disadvantage) to this fragmentation method is the same — in order to find a row in the table (provided an index does not exist to assist in the search), all fragments of the table must be scanned. This is usually done en masse and ties up the entire table. Since each disk has a small sub-segment of the table, hopefully the search will not take that long to complete but that time can only be measured against the time required to scan the entire table on one drive. The problem with this method (and RAID level 0 in particular) is that all fragments need to be searched for each query. As a result, a table can only service one request at a time.
By expression fragmentation allows you to define the conditions by which data is distributed among available drives in the system. This allows you to set the conditions that will, by default, speed up queries, and also allows you to create historical tables. How to write the fragmentation expressions are not covered here, but you can use almost any column (or combination of columns) in the table as the foundation in the fragmentation expression.
The primary advantage to by expression fragmentation from a query optimization point of view is that the Informix Query Optimizer will use the fragmentation logic to refine the query path through the table. If a query’s conditions, and the table’s fragmentation scheme, are such that only a couple of the total number of table fragments could possibly have the data requested, only those fragments will be searched. This allows a table to service multiple requests, provided the requests don’t all require data from the same fragments.
I usually recommend against using serial or date-related columns in fragmentation expressions unless your intent is to create historical tables. These types of columns make it very easy to “detach” a fragment from the original table and create a historical table with data for a known time frame or numerical range. Of course this new table can be dropped if the data it contains is no longer needed.
As far as overhead costs are concerned, detaching fragments are much lighter than executing an SQL delete command. When a fragment is detached, the only thing it carries with it from the original table is the data the fragment contained and extent sizing information. All index and index-based constraint information (primary or foreign keys) as well as column level constraints (not null, check constraints) are dropped as well. There is very little reshuffling of data pages required, although there is some work executed within the index pages of the original table to drop all references to data no longer in the table. While the total workload is minimal, I recommend locking the table in exclusive mode prior to detaching the fragment to avoid the possibility of running into a lock overflow condition.
The process of detaching a fragment completes very quickly. A new fragment can then be attached to the original table with the table’s fragmentation expression altered to reflect the range of data it is to contain.
To reduce overhead when extracting data by dropping a fragment, drop all indexes on the table prior to detaching the fragment. Once the fragment is dropped, you can re-create the indexes and flatten out the B+ tree structure. You indexes will work better after you do this.