Query-based Paging Strategies
The key to implementing efficient query-based paging is to limit the I/O to the database so that you fetch only the exact page(s) that you need. Simple approaches query for the entire result set and then iterate over it until they reach the target page, or they use JDBC to fetch from a given row, for example by using the JDBC method
. This guarantees that the only data they serve to the client is the requested page. The ValueListHandler pattern from Sun Microsystems adopts this approach. But beware, you may not incur the cost of serializing redundant data into the application server but the DBMS will have to assemble your results temporarily so that it can move/cursor to the target start row.
One way to guarantee that you never assemble or read redundant data is to manipulate the WHERE clause on your search query to explicitly exclude data that you don't need. In other words, you narrow the query to the requested page. This approach requires that you pass additional context information back and forth on your page requests. This is so that the paging mechanism can adjust the queries automatically to account for data that's already been fetched.
Your page query should be carefully constructed to account for the following:
- Paging direction: Forward/next or back/previous
- Search criteria: Whether entered by the user or as part of your normal business use case
- The target page: Return rows for the requested page only
- Page size: Number of rows in a page
Paging direction is important if you want to be able to scroll both forwards and backwards across your result data. Caching as you go can provide back functionality. However, this works well only for static data. If your data is volatile, your cache data may be out of date by the time you page back. The only way to guarantee correct results is to issue another query to fetch the previous page of data. For cacheless paging, this means using SQL ORDER BY and WHERE clauses. If one or more ORDER BY columns specify the forward paging order, you can page backwards by reversing the ORDER BY directions in conjunction with the appropriate WHERE clause.
Basic business data searches usually entail some sort of dynamic WHERE clause construction so that user-entered criteria may be incorporated in the query. With query-based paging, the WHERE clause includes the search criteria as well as additional semantics to narrow the result set to the correct page. Use row identifiers to limit and qualify your page (see Figure 1). A row identifier uniquely identifies a row in a result set and consists of one or more columns from the result set. Think of it as a logical index into you result set.
|Figure 1: Query-based Paging Using Row Identifiers|
Typically, your row ID can be a primary key. However, depending on the data, it may be a combination of primary key columns and/or other columns. A page is denoted by a start and end row ID pair: the first row and last row in the page, respectively. To page forward (next), adjust the query to fetch all rows where the row ID is greater than the last row ID of the current page. To page back, query for all rows where the row ID is less than the first row ID of the current page.
This mechanism works fine for single-column, simple row IDs and ascending forward paging order, but searches are rarely that simple. In some cases, the row ID is a combination of columns and normal forward paging direction is a mixture of ascending (ASC) and descending (DESC) ORDER BY columns. The trick is to ensure that all your ORDER BY columns are included in the WHERE clause (including the row ID columns) and all the row ID columns are included in the ORDER BY, adjusted for forwards (paging next) and backwards (paging back) ordering.
Row-ID-based Paging Algorithm
Take a basic query example that selects three columns, col1 (VARCHAR), col2 (INTEGER), and col3 (TIMESTAMP), from table foo. To fetch the first page, issue the following SQL:
ORDER BY col2 ASC ,col3 DESC
In this example, the search criteria supplied by the user is applied to col1. The row ID is comprised of col1 and col2, and the normal forward paging order is defined as col2 ascending and col3 descending.
JDBC provides a function to limit the number of rows returned for a query, Statement::setMaxRows(int max). To fetch the first page, simply issue the basic query and set the max rows to your page size. Use this for every paging query as it guarantees that you fetch only one page of data for any given request. For optimum performance, use
Statement::setFetchSize(int rows) to set the fetch size to match the page size. This guarantees that the page will be fetched in one low-level IO request between the driver and the DBMS.
To fetch the next page, adjust the WHERE clause to exclude the first page by fetching all rows that match the search criteria and that are logically greater than the last row ID of the first page. The row ID part of the WHERE clause is constructed from the specified row ID columns (col2 and col3) for the query, and the row ID values are substituted from the first page.
Fetch the next page:
WHERE col1=? AND
( (col2 > ? ) OR (col2 = ? AND col3 < ? ) )
ORDER BY col2 ASC ,col3 DESC
To fetch the previous page, flip the ORDER BY's to fetch backwards and adjust the WHERE clause to exclude unwanted rows by fetching rows that are logically less than the row ID of the first row of the current page.
Fetch the previous page:
WHERE col1=? AND ( (col2 < ? ) OR (col2 = ? AND col3 > ? ) )
ORDER BY col2 DESC ,col3 ASC
In your Java code, ensure that you reorder the results to normal forward ordering before serving the page to the client.
This paging algorithm works seamlessly for very large result sets and arbitrarily complicated or unbounded queries, provided you use the JDBC
Statement::setMaxRows() function and construct the WHERE and ORDER BY clauses appropriately. For optimum performance, insure that your WHERE clause columns are indexed.