Browse DevX
Sign up for e-mail newsletters from DevX


Tame Beastly Data with Summary Tables-4 : Page 4




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Update on Request
Examining our queries more carefully, we might:
  • Decide that query A is acceptable without the current day's data
  • Recognize that query B doesn't need the current day's data
  • Establish that query C runs fast enough on the raw data.

Hence, we could skip the overhead of a refresh immediate summary, and rebuild the summary each day. A refresh deferred summary table is a neat way to do this. It allows more flexible queries, but has its own limitations.

Refresh deferred summary tables are similar to refresh immediate, except they:
  • Operate asynchronously; they are refreshed only when you explicitly refresh them
  • May use scalar functions
  • May use other column functions, such as MIN and MAX
  • Do not require a COUNT(*) or COUNT_BIG(*)
  • May contain a HAVING clause
  • Cannot be used to automatically optimize static SQL queries
  • Can be used to automatically optimize dynamic SQL queries under certain circumstances (see sidebar "A Faster Disaster").

Listing 5 shows a refresh deferred definition. Remember, you can increase flexibility and reduce overhead by deferring the update.

Listing 5: Create a Deferred Summary Table CREATE SUMMARY TABLE TEST.GOOP_DEF AS ( SELECT Machine, DateMade, SUM(Weight) AS SumWeight, COUNT(Weight) AS CountWeight, COUNT(*) AS Bags, MIN(SerialNo) as MinSerial, MAX(SerialNo) as MaxSerial FROM TEST.GOOP GROUP BY Machine, DateMade ) DATA INITIALLY DEFERRED REFRESH DEFERRED @ create index TEST.GOOPDEFMDIX ON TEST.GOOP_DEF (Machine, DateMade) @

Note that the contents of goop_def are only updated upon the command

refresh table TEST.GOOP_DEF

meaning that the contents of goop_def are not guaranteed to be current.

Refresh deferred summary tables are good when:

  • You can refresh a summary table once and query it many times, due to the timing of the transactions and queries—for example, if you run queries on historical daily summaries or the source data arrives in bursts.
  • You need the extra functions that aren't available with refresh immediate.

So it's still a balancing act, and there's no single answer to all situations, but now you have two tools to cut your query times when collating large data sets. Decide whether your summary needs to be live, or simply refreshable, and let DB2 work smart—like Eddie.

Greg Nash is an integration engineer who likes finding solutions to complex problems.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date