Paging in J2EE: Manage Large Result Sets Efficiently

Pagination is the simplest and most common way to break up large amounts of data into more manageable chunks. Pagination is a key part of Web site design, both from the UI perspective (to manage limited screen real estate) and from the server perspective (to process large result sets efficiently, without causing resource spikes or servicing delays). In J2EE, efficient pagination (and good UI design) are essential for handling large query result sets in a resource-pooled environment.

Specifically, the following conditions make pagination necessary:

  • Screen real estate is limited. Your JSP/client UI may have insufficient space to display the entire results at once and your user may not be able to find what he or she wants quickly in a very large data set.
  • Your resources are pooled or constrained. Result sets are too large to be managed reasonably within your connection and memory limitations.

Paging Strategies That Work
Basic querying and paging strategies can start out simple but quickly lead to disaster if you fail to consider query evolution and data growth. Open-ended or unbounded queries that perform well over small test data sets may degrade sharply against real data. If you don’t manage your precious DBMS and memory resources, your problems will only compound under load.

Efficient data retrieval and paging is a trade-off between memory usage and response-time requirements. Broadly speaking, most roll-your-own J2EE paging mechanisms fall into two basic categories:

  • Cache based: Results are cached for fast access in subsequent pages.
  • Query based: Results are fetched from the DBMS on demand as the user pages.

Cache-based approaches cache the query results over and above what the database provides, in an intermediate tier (on the HTTP session, in the server using a Stateful Session Bean or in a home-grown cache). Subsequent page requests hit the cache rather than the database. Such cache-based mechanisms are memory intensive and work well for low-volumes, recurring queries, and small to medium-sized result sets. The cache-hit rates are high and the I/O overheads are low, yielding good response times.

Query-based approaches don’t use caches. They deliver pages on demand, direct from the DBMS to the client. They service additional pages by issuing more queries to the DBMS rather than pulling data from a cache. These approaches discard or skip unwanted rows from previous or future pages using low-level JDBC functions and context information from the last fetched page. Query-centric paging is memory efficient but requires additional round trips to the database, leading to slightly slower response times overall.

More sophisticated mechanisms adopt a hybrid approach. They constrain cache sizes by caching a fixed number of pages ahead, maintaining bounded cache windows, and using efficient query mechanisms and offline threads to backfill the results transparently to the client.

Caching vs. Query Trade-offs
Cache-based approaches are fast, but they have a few pitfalls. For example, unbounded or open-ended queries for large result sets can lead to uncontrolled memory growth as the results are serialised and cached. Large result sets take a long time to service and take up valuable connection and session resources. Whenever a connection is consumed for long periods of time, the number of connections available in your J2EE shared connection pool is effectively reduced. To cope with high load, your system should aim to service your requests as quickly as possible. If your requests take a long time to process, your overall throughput will drop. You don’t want to occupy your resources with long-lived queries, because you won’t have any connections and threads left to do anything else.

Long-lived queries can also lead to connection timeouts if your query takes more time to execute than your preset JDBC connection time limit allows. Avoid this in cache-based strategies by narrowing or constraining your search criteria so that your DBMS queries (and results) are bounded.

Caching is most beneficial when the cached data is hit repeatedly. However, if your user searches are consistently yielding distinct results, your caches can’t be reused across requests. Worse, if the user rarely pages beyond the first page, your caching within the session is for naught. You should carefully manage caches so that they don’t grow too large and they expire promptly when they’re no longer required.

Hybrid or query-based approaches are more reliable than cache-based alternatives. They have slightly slower response times but can scale transparently for large result sets and high request volumes. An unbounded query will take a long time to execute but won’t exceed your memory constraints or take too long to execute. The DBMS connection is relinquished once the required data (usually the first one or two pages) has been fetched. This means that the connection is held for the bare minimum before being returned to the connection pool. The only downside is that additional context information must be passed back and forth on your page requests.

Table 1 lists the issues you need to consider when deciding on a paging approach.

Conditions for Pagination Cache-based Approach Hybrid/Query-based Approach
Is your result data bounded or unbounded? Applicable for fixed or bounded result sizes; search criteria is fixed or constrained Applicable for unbounded results sets (such as searches with loose criteria) since the implementation inherently limits result size
How big are your result sets? Good for small to medium-sized Good for medium-sized to large
What are your response-time requirements? When response time is critical When response time is less critical
How often is the query issued? Do you fetch the same data repeatedly, or is your result data different for each request? Applicable for when the same data is requested repeatedly Applicable for when the search criteria and results vary from request to request
How volatile is your (static vs. dynamic)? What are your requirements? Unchanging or static data can be serviced reliably from caches If data is dynamic, query every time to return the most up-to-date data
Table 1. Cache-based vs. Hybrid/Query-based Paging Strategies

Query-based Paging Strategies
The key to implementing efficient query-based paging is to limit the I/O to the database so that you fetch only the exact page(s) that you need. Simple approaches query for the entire result set and then iterate over it until they reach the target page, or they use JDBC to fetch from a given row, for example by using the JDBC method ResultSet::absolute(). This guarantees that the only data they serve to the client is the requested page. The ValueListHandler pattern from Sun Microsystems adopts this approach. But beware, you may not incur the cost of serializing redundant data into the application server but the DBMS will have to assemble your results temporarily so that it can move/cursor to the target start row.

One way to guarantee that you never assemble or read redundant data is to manipulate the WHERE clause on your search query to explicitly exclude data that you don’t need. In other words, you narrow the query to the requested page. This approach requires that you pass additional context information back and forth on your page requests. This is so that the paging mechanism can adjust the queries automatically to account for data that’s already been fetched.

Your page query should be carefully constructed to account for the following:

  • Paging direction: Forward/next or back/previous
  • Search criteria: Whether entered by the user or as part of your normal business use case
  • The target page: Return rows for the requested page only
  • Page size: Number of rows in a page

Paging direction is important if you want to be able to scroll both forwards and backwards across your result data. Caching as you go can provide back functionality. However, this works well only for static data. If your data is volatile, your cache data may be out of date by the time you page back. The only way to guarantee correct results is to issue another query to fetch the previous page of data. For cacheless paging, this means using SQL ORDER BY and WHERE clauses. If one or more ORDER BY columns specify the forward paging order, you can page backwards by reversing the ORDER BY directions in conjunction with the appropriate WHERE clause.

Basic business data searches usually entail some sort of dynamic WHERE clause construction so that user-entered criteria may be incorporated in the query. With query-based paging, the WHERE clause includes the search criteria as well as additional semantics to narrow the result set to the correct page. Use row identifiers to limit and qualify your page (see Figure 1). A row identifier uniquely identifies a row in a result set and consists of one or more columns from the result set. Think of it as a logical index into you result set.

Figure 1: Query-based Paging Using Row Identifiers

Typically, your row ID can be a primary key. However, depending on the data, it may be a combination of primary key columns and/or other columns. A page is denoted by a start and end row ID pair: the first row and last row in the page, respectively. To page forward (next), adjust the query to fetch all rows where the row ID is greater than the last row ID of the current page. To page back, query for all rows where the row ID is less than the first row ID of the current page.

This mechanism works fine for single-column, simple row IDs and ascending forward paging order, but searches are rarely that simple. In some cases, the row ID is a combination of columns and normal forward paging direction is a mixture of ascending (ASC) and descending (DESC) ORDER BY columns. The trick is to ensure that all your ORDER BY columns are included in the WHERE clause (including the row ID columns) and all the row ID columns are included in the ORDER BY, adjusted for forwards (paging next) and backwards (paging back) ordering.

Row-ID-based Paging Algorithm
Take a basic query example that selects three columns, col1 (VARCHAR), col2 (INTEGER), and col3 (TIMESTAMP), from table foo. To fetch the first page, issue the following SQL:

SELECT 	col1,col2,		col3 FROM 	foo WHERE  	col1=? ORDER BY 	col2 ASC ,col3 DESC 

In this example, the search criteria supplied by the user is applied to col1. The row ID is comprised of col1 and col2, and the normal forward paging order is defined as col2 ascending and col3 descending.

JDBC provides a function to limit the number of rows returned for a query, Statement::setMaxRows(int max). To fetch the first page, simply issue the basic query and set the max rows to your page size. Use this for every paging query as it guarantees that you fetch only one page of data for any given request. For optimum performance, use Statement::setFetchSize(int rows) to set the fetch size to match the page size. This guarantees that the page will be fetched in one low-level IO request between the driver and the DBMS.

To fetch the next page, adjust the WHERE clause to exclude the first page by fetching all rows that match the search criteria and that are logically greater than the last row ID of the first page. The row ID part of the WHERE clause is constructed from the specified row ID columns (col2 and col3) for the query, and the row ID values are substituted from the first page.

Fetch the next page:SELECT 	col1,col2,		col3 FROM 	foo WHERE  	col1=?  AND 		( (col2 >  ?  )  OR  (col2 = ?  AND col3 <  ?  )  )  ORDER BY 	col2 ASC ,col3 DESC 

To fetch the previous page, flip the ORDER BY's to fetch backwards and adjust the WHERE clause to exclude unwanted rows by fetching rows that are logically less than the row ID of the first row of the current page.

Fetch the previous page:SELECT 	col1,col2,		col3 FROM 	foo WHERE  	col1=?  AND ( (col2 <  ?  )  OR  (col2 = ?  AND col3 >  ?  )  )  ORDER BY 	col2 DESC ,col3 ASC 

In your Java code, ensure that you reorder the results to normal forward ordering before serving the page to the client.

This paging algorithm works seamlessly for very large result sets and arbitrarily complicated or unbounded queries, provided you use the JDBC Statement::setMaxRows() function and construct the WHERE and ORDER BY clauses appropriately. For optimum performance, insure that your WHERE clause columns are indexed.

Put Query-based Paging into Practice
Once you have established your basic query-based paging algorithm, put it into practice by creating a generic, reusable paging component that can be used for any arbitrary simple or complex query. Your paging framework should provide a clear interface, a simple API, and it should encapsulate the low-level internals of the paging implementation and algorithm. It should handle data fetching, transport, row IDs and parameter passing, query manipulation, as well as simple search criteria parameter substitution.

One way to set up this framework is to adopt a configuration-driven or declarative approach in which you pre-define the search as a logical paginated view in a properties file and supply the actual search parameters at runtime with the request. In your configuration file, you can make the view explicitly identify the row IDs, ORDER BY columns, and any other information deemed necessary for your paging implementation. To issue a search, the caller doesn't have to do any fancy SQL footwork. It just identifies and loads the desired view using the view's unique identifier and then issues the search/fetch request along with any user-supplied criteria.

A view is just a logical window across one or more physical DBMS tables and views. A paginated view is a view that contains additional discriminating information (such as row IDs) that enables it to be paged. Configure your paginated view in a properties file and define the forward paging direction (ORDER BY's) and row ids.

The following steps walk you through the creation of a paging component. For a complete package of all the required Java classes, download the source code.

Step 1. Create/configure your paginated view

# Example viewexample.view=fooexample.pagesize=5example.where=col1=?example.rowids=col2 ASC,col3 DESC

Step 2. Create a class to represent your view, PageDefn
Create a Java class (PageDefn) to represent your paginated view. You can either load it from your properties file or create it on the fly in your code. Configuration is the preferred approach, as the caller doesn't have to know about the view internals or table meta-data. Your PageDefn contains all the information relevant to your paginated view and is responsible for handling the SQL construction details:

public class PageDefn extends ViewDefn {	public interface PageAction {		public static final int FIRST = 0;		public static final int NEXT = 1;		public static final int PREVIOUS = 2;		public static final int CURRENT = 3;	}	protected int pageSize;	protected ColumnDesc[] rowIds;	public PageDefn() {		super();		rowIds = null;		pageSize = 50;	}	public PageDefn(String viewname) {		super(viewname);		rowIds = null;		pageSize = 50;	}...}

Step 3: Create a paging data access object (DAO)
Create a special-purpose PagingDAO (see Listing 1: PageDefn.java) that supports parameter substitution and uses Prepared statements. Prepared statements are faster than normal queries as they can be pre-compiled by the DBMS. Your PagingDAO executes the paging query and handles the results. The actual SQL construction is delegated to the PageDefn. This separation of concerns allows you to evolve the PageDefn and its subclasses to support more sophisticated query configuration independently of the DAO.

Ensure that your DAO closes all database resources (ResultSets, Statements, and Connections) once the query has been executed. Put your close code inside a FINALLY clause so that your resources will close even when an exception is thrown.

Step 4. Create a PageContext and Page command
Create a PageCommand class to encapsulate your page request and hold the results. Client servlets and actions will use the PageCommand to interact with your paging framework. Your command should provide methods to do the following:

  • Specify the target view (PageDefn)
  • Supply optional search criteria (query parameters)
  • Specify the page action (FIRST, CURRENT, NEXT, or BACK)
  • Access the page results

In addition, your PageCommand should encapsulate any context information required by the paging implementation to support your paging algorithm. Create a context object that holds your paging request action and results, and encapsulates your paging internals:

public class PageContext implements Serializable {	protected Object page; // results	protected Object firstEntry; //  first row ID 	protected Object lastEntry; // last row id	protected int action; // paging action	protected PageContext prevContext; // previous context state	public PageContext() {		this.page = new Object[0];		this.firstEntry = null;		this.lastEntry = null;		this.action = PageDefn.PageAction.FIRST;	}	public Object[] getPage() {		return ((Collection) page).toArray();	}	public void setPage(Object page) {		this.page = page;	}	public int getAction() {		return action;	}		public void setAction(int action) {		this.action = action;	}}

Ensure that your PageContext object is Serializable, as it will be passed on the wire back and forth between the Servlet and middle tiers.

Create your PageCommand class to encapsulate your paging requests, and include the PageContext as a member variable:

public class PageCommand extends ViewCommand {	protected PageContext context;...}

The PageCommand class is just a data transfer object (DTO) that conveys request parameters and holds page results. Page data and row-IDs are held inside the PageContext object. The caller doesn't have to know about the PageContext attribute directly except to remember to re-use it across page requests for a given read request.

Step 5. Create a PagingService
Create a specialized paging service (see The COR Pattern Puts Your J2EE Development on the Fast Track for more information on COR services and configuration) for handling PageCommand requests. It will process PageCommands and pass them to the PagingDAO for processing:

public class PageService implements Service {		public Command process(Command command) throws ServiceException {		if (!command.getClass().equals(PageCommand.class)) {			return null;		}		PageCommand pcmd = (PageCommand) command;		PageContext context = pcmd.getContext();		PageDefn desc = (PageDefn) pcmd.getDefn();		// Get A DAO    		PagingDAO dao = PagingDAO.get();		// Issue Query, results are set into the context by the DAO		try {			dao.executeQuery(desc, context, true);			return pcmd;		} catch (DAOException e) {			throw new ServiceException(e);		}	}}

Step 6. Exercise your Page command!
To fetch the first page, create an instance of PageCommand, load your target view, set any user-supplied parameters and page action, and issue the command to the CORManager. The CORManager will transparently route the command to the PageService to handle:

// load view and set user supplied criteriaPageDefn d = PageDefnFactory.getPDF().createPageDefn(bundle,view);d.setParams(new Object[] { criteria });PageCommand cmd = new PageCommand(d);// fetch the first pagecmd.setAction(PageDefn.PageAction.FIRST);cmd = (PageCommand) CORManager.get().process(cmd);// process resultsPageContext context = cmd.getContext();// Process resultsObject[] rows = cmd.getContext().getPage();if (rows == null) return;for (int i = 0; i < rows.length; ++i) {	System.out.println("ROW(" + i + ") " + rows[i].toString());}// cache context to be reused for subsequent pages..getServletContext().setAttribute("context",cmd.getContext());

To fetch the next page, insure that you reuse the PageContext from the previous request. The PageContext contains all the information the paging mechanism needs for qualifying your queries. If you're managing the paging from the servlet tier, cache the PageCommand/PageContext object on the HttpSession so that you can reuse it when the user pages forward or back:

// Create PageDefintion..PageDefn d = PageDefnFactory.getPDF().createPageDefn(bundle, view);// Retrieve context from ServletContextPageContext c = (PageContext) getServletContext().getAttribute("context");// Create Page Command  PageCommand cmd = new PageCommand(d,c);cmd.setAction(PageDefn.PageAction.NEXT);cmd = (PageCommand) CORManager.get().process(cmd);// cache result on servlet contextgetServletContext().setAttribute("context",cmd.getContext());...

Current Page Number and Total Page Count
You may want to keep track of the current page number and the total page count. You can display this information to the user, as well as prevent paging past the first or last page. For query-based paging, the simplest approach is to use SQL to count up the total number of rows in the result using the SQL function COUNT(*) or COUNT(column name). You can derive the total number of pages by dividing the number of rows by the page size. You can issue the count query every time you request a page, but this may be too much. Underneath the covers, the DBMS will have to assemble your entire result set to count the results. As a compromise, issue the count query when the first page is requested. This way, the cost is incurred once per search/read request for the first page. In most databases, you will be able to append the COUNT(*) to your column list in your query for the first page. For the databases that don't allow this (like MySQL), issue a separate count query.

The first page will be slightly slower to fetch than other pages. However, your UI will be able to accurately show the total number of pages returned, as well as the current page number. If page count isn't important or the data is so volatile that the total page count isn't reliable, avoid the overhead altogether by skipping the count query.

To add page number and total page count support to your paging mechanism, extend the PageContext object to store and maintain page counts and extend the PagingDAO to issue an additional count query on receipt of a FIRST page request.

Paging in Perspective
Searching and querying over data is an integral part of most Web sites. Paging is an ideal way to manage limited server resources, large result sets, and congested UI real estate. The approach you chose, whether cache-based, query-based, or hybrid, depends on your use cases and data characteristics. Regardless, you should limit your memory and connection usage so that your paging architecture can scale efficiently to meet your performance targets.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

The Latest

your company's audio

4 Areas of Your Company Where Your Audio Really Matters

Your company probably relies on audio more than you realize. Whether you’re creating a spoken text message to a colleague or giving a speech, you want your audio to shine. Otherwise, you could cause avoidable friction points and potentially hurt your brand reputation. For example, let’s say you create a

chrome os developer mode

How to Turn on Chrome OS Developer Mode

Google’s Chrome OS is a popular operating system that is widely used on Chromebooks and other devices. While it is designed to be simple and user-friendly, there are times when users may want to access additional features and functionality. One way to do this is by turning on Chrome OS

homes in the real estate industry

Exploring the Latest Tech Trends Impacting the Real Estate Industry

The real estate industry is changing thanks to the newest technological advancements. These new developments — from blockchain and AI to virtual reality and 3D printing — are poised to change how we buy and sell homes. Real estate brokers, buyers, sellers, wholesale real estate professionals, fix and flippers, and beyond may