ince the beginning of the year I’ve received multiple emails from DevX readers regarding a discussion titled “How can I retrieve set by set records from a table?” on the DevX Oracle discussion group. The discussion revolves around a reader who wants to retrieve rows in sets of 10 and each time he or she retrieves another set the 10 records must not be previously obtained (i.e., the first time the reader gets rows 1 to 10; the second time, 11 to 20 without getting rows 1 to 10 again; the next time, 21 to 30, etc.).
The reader interest has been so high and the number of erroneous approaches so prevalent that I have decided to dedicate this 10-Minute Solution to describing a solution to the reader’s request.
How do I retrieve rows set by set from a database without getting previously obtained rows in subsequent sets?
Use Oracle’s package structure as a holder for session-persistent data in combination with the new Oracle 8i BULK COLLECT feature and LIMIT keyword.
In this 10-Minute Solution, I am going to use the EMP table in the query examples. This table belongs to the SCOTT schema of the Oracle starter database (provided with Oracle database server versions 7.3 to 9i).
Here is the table’s structure:
Name Null? Type———– ——– ————EMPNO NOT NULL NUMBER(4)ENAME VARCHAR2(10)JOB VARCHAR2(9)MGR NUMBER(4)HIREDATE DATESAL NUMBER(7,2)COMM NUMBER(7,2)DEPTNO NUMBER(2)
Run the following query to get a look at the data in the EMP table:
select Empno, Ename, Job, Mgr, Hiredate, Salfrom Emp;
The following listing shows the output of the above code:
EMPNO ENAME JOB MGR HIREDATE SAL——- ———- ——— ———- ——— ———- 7369 SMITH CLERK 7902 17-DEC-80 800 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 7521 WARD SALESMAN 7698 22-FEB-81 1250 7566 JONES MANAGER 7839 02-APR-81 2975 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 7698 BLAKE MANAGER 7839 01-MAY-81 2850 7782 CLARK MANAGER 7839 09-JUN-81 2450 7788 SCOTT ANALYST 7566 19-APR-87 3000 7839 KING PRESIDENT 17-NOV-81 5000 7844 TURNER SALESMAN 7698 08-SEP-81 1500 7876 ADAMS CLERK 7788 23-MAY-87 1100 7900 JAMES CLERK 7698 03-DEC-81 950 7902 FORD ANALYST 7566 03-DEC-81 3000 7934 MILLER CLERK 7782 23-JAN-82 130014 rows selected.
The query returns only 14 rows, but that is enough if you want to retrieve records in sets of three (i.e., if you choose the value of N = 3). For all the code examples, you will retrieve the names and salaries (ENAME and SAL columns) of employees whose salaries are greater than or equal to $750.
The following query returns all the results in one batch:
select Ename, Salfrom Empwhere Sal >= 750;
The following listing shows the output from the above code:
ENAME SAL———- ———-SMITH 800ALLEN 1600WARD 1250JONES 2975MARTIN 1250BLAKE 2850CLARK 2450SCOTT 3000KING 5000TURNER 1500ADAMS 1100JAMES 950FORD 3000MILLER 130014 rows selected.
The query ran almost instantly and the results look great. Again, you have only 14 rows.
Erroneous Solutions: What You Should Not Do and WhyA common mistake among the erroneous solutions for retrieving record sets row by row is trying to utilize the Oracle pseudocolumn ROWNUM. For each row returned by a query, the ROWNUM pseudocolumn returns a number that indicates the order in which Oracle selects the row or a set of joined rows from a table. The first row selected has a ROWNUM of 1, the second has 2, and so on.
The first erroneous solution posted in the discussion used the following syntax:
select Ename, Salfrom Empwhere sal >= 750and rownum >= 1and rownum <= 3;
The following listing shows the output:
ENAME SAL———- ———-SMITH 800ALLEN 1600WARD 1250
The obvious question is how do you then get the next N (three) rows? The author suggested that you just need to keep adding N to the lower and upper boundaries, like this:
select Ename, Salfrom Empwhere sal >= 750and rownum >= 4and rownum <= 6;
If the author had run his suggested solution more than once, he would have noticed that such a query returns nothing. The reason is right in ROWNUM’s definition: if you want to get only rows 4 through 6, that means you don’t want rows 1 to 3. But that’s the only way to get to rows 4 to 6 with this solution.
Solution 2 offered a slightly more complex syntax that aimed to correct the ROWNUM problem by moving ROWNUM filtering from the WHERE clause to the HAVING clause:
select rownum, max(Ename) as Ename, max(Sal) as Salfrom Empwhere sal >= 750group by rownumhaving rownum > = 1and rownum < = 3;
The following listing shows the output:
ROWNUM ENAME SAL——– ———- ———- 1 SMITH 800 2 ALLEN 1600 3 WARD 1250
The solution then called for the user to query to get the next set:
select rownum, max(Ename) as Ename, max(Sal) as Salfrom Empwhere sal >= 750group by rownumhaving rownum > = 4and rownum < = 6;
The following listing shows the output of the above code:
ROWNUM ENAME SAL——— ———- ———- 4 JONES 2975 5 MARTIN 1250 6 BLAKE 2850
These results look much better than those for Solution 1, but they are far from perfect. Why?
First of all, this solution uses a GROUP BY clause so the SQL engine will accept the HAVING clause syntax. But using GROUP BY necessitates the MAX() function for each of the columns in the FROM list. That is totally unnecessary and inefficient because it leads to an additional SORT GROUP BY operation. Most importantly, filtering in the HAVING clause applies only after all of the matching rows are retrieved and sorted. So if 100,000 rows match the criteria in the WHERE clause, every time you ask for the next 10 rows, the database will retrieve and sort all 100,000 and then filter them by row number. Extremely inefficient!
In addition, the solution assumes that the database will return rows in the same order for the same SQL query every time. That’s a dangerous assumption! The user cannot guarantee that the database will retrieve rows in the same sequence unless it is enforced by the ORDER BY clause. The ORDER BY clause would force sorting of the whole set of retrieved rows, which is time-consuming and may not be necessary at all. In the example of 100,000 rows, every time you ask for the next 10, the database will retrieve all 100,000 rows, sort them (SORT ORDER BY operation), and then return a batch of 10 rows. That does not sound good at all!
Sort order is not important in certain situations, and an index could be used to avoid using the ORDER BY clause. For example, if you create a composite index on the SAL and ENAME columns, you would get rows sorted by salary and then by name without using the ORDER BY clause and the SORT ORDER BY operation.
Try using an index in your queries:
create index Emp_Sal_Enameon emp(sal, ename)compute statistics;
Even with the GROUP BY clause enforcing the same sequence in the retrieved rows, this solution has a potential problem. Let’s say you’re searching multiple Web sites for a new house. After all, the house market is still strong–unlike the NASDAQ. You set your preferences to sort houses by price and display 10 per page. While you are browsing through the first 10, the database gets updated with three new inexpensive offers. (Living in the San Francisco Bay Area has erased the word cheap from my vocabulary.) That would shift some of the houses you’re viewing three positions down. So what you actually are looking at are no longer rows 1 through 10. You’re seeing the first, third, and fourth, and 7 through 13 (the newly inserted bargains take the second, fifth, and sixth positions).
This should not be a problem at all. After all, databases have to be up-to-date. Plus, the Web site cannot lock the database while you view the data; you are one of many users and the database cannot work exclusively for you. However, what would happen when you click the Next button to get the next 10 records? You would see the first three (11, 12, and 13) repeated again. That would either imply software problems or duplicates (or dupes) in that Web site’s listings.
Packages to the RescueClearly, the problem with the previously discussed solutions is that they try to retrieve results again and again from the beginning. Wouldn’t making the database remember what it had shown in previous batches be better? It would completely eliminate the inefficiency problem that unnecessary sorting causes, and it would even take care of the dupes problem.
What would enable such a mechanism? Packages. I am always surprised at how many Oracle programmers move from SQL to PL/SQL, learn how to write procedures and functions, and yet stop one step short of packages. The author of my favorite Oracle book (“Oracle PL/SQL Programming”), Steven Feuerstein, wrote “Packages are among the least understood and most underutilized features of PL/SQL. That is a shame, because the package structure is also one of the most useful constructs for building well-designed PL/SQL-based applications.”
Packages enable you to keep data in memory (for the quickest access) and share it across all your application components. The data would persist for the duration of the session. Unless you want to persist data to the table(s), permanent or temporary, and worry about how to maintain/clean it up later, packages are exactly what you need.
Develop an API to Perform Retrieval of the Next N Rows
The API I propose will be based on a package with one cursor and multiple procedures to open, close, and fetch rows from the cursor. The cursor stays open throughout the session, and it remembers the fetch history because it automatically sets its pointer to the next available row: fetching the first 10 rows would move the pointer to row 11, the next 10 to row 21, etc.
The package specification consists of five procedures:
- OpenCursor
- CloseCursor
- RetrieveRows
- SetRows – to set the default value of rows to fetch
- SetSalary – to set the default value of the salary parameter that will be passed to the cursor
- tName – PL/SQL table of Emp.Ename type
- tSal – PL/SQL table of Emp.Sal type
create or replace package GetNextSet as type tName is table of Emp.Ename%type index by binary_integer; type tSal is table of Emp.Sal%type index by binary_integer; procedure CloseCursor; procedure OpenCursor (pSalary pls_integer := null); procedure RetrieveRows (pRows pls_integer := null, pSalary pls_integer := null); procedure SetRows (pRows pls_integer); procedure SetSalary(pSalary pls_integer);end GetNextSet;/
The RetrieveRows procedure does the actual work of retrieving the next N rows. It uses the new Oracle 8i BULK COLLECT clause of the FETCH INTO command with the LIMIT keyword. The BULK COLLECT clause allows us to bulk-fetch rows from the cursor into the vName collection of employees’ names and the vSal collection of salaries, while LIMIT keyword sets the size of the batch:
fetch csrEmployees bulk collect into vName, vSal limit vRows;
That’s it! That one line of code does all the work! Passing batch size as a parameter to RetrieveRows procedure allows you to change it from one fetch to another (i.e., give me the first 10 rows, then the next 25 rows, etc.).
Listing 1 shows the GetNextSet package body.
Use the GetNextSet Package
You use the GetNextSet package by calling its procedures from SQL*Plus. First, you need to enable screen output by setting the SERVEROUTPUT value to ON:
set serveroutput on
Now, start retrieving rows of all employees with salaries greater or equal to 800 in chunks of three (remember, you have 14 total rows in the Emp table):
begin GetNextSet.RetrieveRows(3, 800);end;/
Here is the output from the above code:
Opening the cursor.SMITH 800JAMES 950ADAMS 1100
You’ve sorted the first three rows by salary and then by name, because the cursor definition in your SQL is using the Emp_Sal_Ename index as you planned. Now retrieve the next set:
begin GetNextSet.RetrieveRows;end;/
Here is the output of the above code:
MARTIN 1250WARD 1250MILLER 1300
Only 14 rows are returned and you’re probably getting a little impatient. Change the size of the set from three to six:
begin GetNextSet.RetrieveRows(6);end;/
Now you’re rolling:
TURNER 1500ALLEN 1600CLARK 2450BLAKE 2850JONES 2975FORD 3000
Now, perform the last fetch:
begin GetNextSet.RetrieveRows;end;/
Here are the results:
SCOTT 3000KING 5000All rows retrieved. Closing the cursor.
To try out other procedures in the GetNextSet package, execute the following PL/SQL block, which changes your default settings for set size to five and for salary to 2000:
begin GetNextSet.SetRows(5); GetNextSet.SetSalary(2000);end;/
Now you will start retrieving all high-salaried employees in chunks of five:
begin GetNextSet.RetrieveRows;end;/
Here is the output:
Opening the cursor.CLARK 2450BLAKE 2850JONES 2975FORD 3000SCOTT 3000
The next fetch:
begin GetNextSet.RetrieveRows;end;/
appears to be your last one:
KING 5000All rows retrieved. Closing the cursor.
Looks good and runs instantly! Of course, the table is tiny?only 14 rows. What would happen in a table with hundreds of thousands of rows? I conducted simple performance testing to find out.
As I mentioned previously, the packages solution eliminates the need to retrieve or sort for the consequent requests. The cursor keeps the pointer on the next available row, so you get just the next N rows from the cursor every time. Easy and simple!
Performance Testing Results
I populated a table with 200,000 rows and retrieved rows in chunks of 10 with a total of 100,000 matching my filter criteria. Opening the cursor took 0.2 seconds, and fetching the next 10 rows took 0.15 seconds?regardless of whether it was the first 10 rows or a set in the middle of the 100,000 rows. I consider these good results, particularly in light of Solution #2 from the Erroneous Solutions section. Executing Solution #2 code got slower for subsequent requests and it became really painful as I got to the middle of the 100,000 rows. It took 15-17 seconds, which is completely unacceptable for retrieving just 10 rows.
What to Improve If You Are on Oracle 9i
Oracle 8i imposes a restriction on the usage of the BULK COLLECT clause: the target cannot be a collection of records; it has to be individual collections. That’s why I declared two types (tName and tSal) and initialized two collections (vName and vSal). Oracle lifted this restriction in 9i. You need to just declare a type and initialize a collection of that type.
Two new PL/SQL features in Oracle 8i that I like are Native Dynamic SQL (see New PL/SQL Features in Oracle 8i: Part I) and Bulk Binds (see New PL/SQL Features in Oracle 8i: Part II). The problem is you cannot use them together in 8i. If you open a cursor with Native Dynamic SQL, you cannot perform bulk-fetching. That’s why I couldn’t use Native Dynamic SQL and had to hardcode the cursor in my package body. The current code works only against Emp table and retrieves only two fields, name and salary.
But if you are on Oracle 9i, you could make the cursor definition code more generic by constructing a SQL string using passed in parameters: table name(s), list of fields, WHERE or ORDER BY clauses?all of it could be parameterized. Once your package is more generic, you can use it in many places. (Just to clarify: building a cursor’s SQL dynamically can be done in Oracle 8i also, but then you can not use the BULK BINDS/BULK COLLECT feature and you have to settle for fetching rows one at a time, which is not nearly as efficient as bulk-fetching.)
The good thing about packages is that the existing package structure allows you to add more procedures to support additional functionality (e.g., skipping forward or backward, getting the total amount of sets (pages) with a specified set size, and more). That’s just another example that shows why packages is the way to go in PL/SQL development.