Browse DevX
Sign up for e-mail newsletters from DevX


Retrieve Rows Set by Set from the Database—Without Duplicates-3 : Page 3




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Erroneous Solutions: What You Should Not Do and Why
A common mistake among the erroneous solutions for retrieving record sets row by row is trying to utilize the Oracle pseudocolumn ROWNUM. For each row returned by a query, the ROWNUM pseudocolumn returns a number that indicates the order in which Oracle selects the row or a set of joined rows from a table. The first row selected has a ROWNUM of 1, the second has 2, and so on.

The first erroneous solution posted in the discussion used the following syntax:

select Ename, Sal from Emp where sal >= 750 and rownum >= 1 and rownum <= 3;

The following listing shows the output:

ENAME SAL ---------- ---------- SMITH 800 ALLEN 1600 WARD 1250

The obvious question is how do you then get the next N (three) rows? The author suggested that you just need to keep adding N to the lower and upper boundaries, like this:

select Ename, Sal from Emp where sal >= 750 and rownum >= 4 and rownum <= 6;

If the author had run his suggested solution more than once, he would have noticed that such a query returns nothing. The reason is right in ROWNUM's definition: if you want to get only rows 4 through 6, that means you don't want rows 1 to 3. But that's the only way to get to rows 4 to 6 with this solution.

Solution 2 offered a slightly more complex syntax that aimed to correct the ROWNUM problem by moving ROWNUM filtering from the WHERE clause to the HAVING clause:

select rownum, max(Ename) as Ename, max(Sal) as Sal from Emp where sal >= 750 group by rownum having rownum > = 1 and rownum < = 3;

The following listing shows the output:

ROWNUM ENAME SAL -------- ---------- ---------- 1 SMITH 800 2 ALLEN 1600 3 WARD 1250

The solution then called for the user to query to get the next set:

select rownum, max(Ename) as Ename, max(Sal) as Sal from Emp where sal >= 750 group by rownum having rownum > = 4 and rownum < = 6;

The following listing shows the output of the above code:

ROWNUM ENAME SAL --------- ---------- ---------- 4 JONES 2975 5 MARTIN 1250 6 BLAKE 2850

These results look much better than those for Solution 1, but they are far from perfect. Why?

First of all, this solution uses a GROUP BY clause so the SQL engine will accept the HAVING clause syntax. But using GROUP BY necessitates the MAX() function for each of the columns in the FROM list. That is totally unnecessary and inefficient because it leads to an additional SORT GROUP BY operation. Most importantly, filtering in the HAVING clause applies only after all of the matching rows are retrieved and sorted. So if 100,000 rows match the criteria in the WHERE clause, every time you ask for the next 10 rows, the database will retrieve and sort all 100,000 and then filter them by row number. Extremely inefficient!

In addition, the solution assumes that the database will return rows in the same order for the same SQL query every time. That's a dangerous assumption! The user cannot guarantee that the database will retrieve rows in the same sequence unless it is enforced by the ORDER BY clause. The ORDER BY clause would force sorting of the whole set of retrieved rows, which is time-consuming and may not be necessary at all. In the example of 100,000 rows, every time you ask for the next 10, the database will retrieve all 100,000 rows, sort them (SORT ORDER BY operation), and then return a batch of 10 rows. That does not sound good at all!

Sort order is not important in certain situations, and an index could be used to avoid using the ORDER BY clause. For example, if you create a composite index on the SAL and ENAME columns, you would get rows sorted by salary and then by name without using the ORDER BY clause and the SORT ORDER BY operation.

Try using an index in your queries:

create index Emp_Sal_Ename on emp(sal, ename) compute statistics;

Even with the GROUP BY clause enforcing the same sequence in the retrieved rows, this solution has a potential problem. Let's say you're searching multiple Web sites for a new house. After all, the house market is still strong--unlike the NASDAQ. You set your preferences to sort houses by price and display 10 per page. While you are browsing through the first 10, the database gets updated with three new inexpensive offers. (Living in the San Francisco Bay Area has erased the word cheap from my vocabulary.) That would shift some of the houses you're viewing three positions down. So what you actually are looking at are no longer rows 1 through 10. You're seeing the first, third, and fourth, and 7 through 13 (the newly inserted bargains take the second, fifth, and sixth positions).

This should not be a problem at all. After all, databases have to be up-to-date. Plus, the Web site cannot lock the database while you view the data; you are one of many users and the database cannot work exclusively for you. However, what would happen when you click the Next button to get the next 10 records? You would see the first three (11, 12, and 13) repeated again. That would either imply software problems or duplicates (or dupes) in that Web site's listings.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date