he task of retrieving the top or bottom N rows from a database table (by salary, sales amount, credit, etc.) is often referred to as a “top-N query.” This task is fairly common in application development.
The most straightforward, but inefficient, way of accomplishing such a query is by retrieving all rows from the database table(s), sorting them by specified criteria, scanning from the top, and then selecting the top N rows. This is definitely not an elegant solution.
Writing a procedure that retrieves and sorts all rows from a database table requires lots of code, and it probably won’t execute as quickly as you’d expect.
Use a single SQL statement to perform a top-N query. You can do so either by using the ROWNUM pseudocolumn available in several versions of Oracle or by utilizing new analytic functions available in Oracle 8i: RANK() and DENSE_RANK().
In this 10-Minute Solution, I am going to use table EMP in the queries examples of this article. This table belongs to the SCOTT schema of the Oracle starter database provided with Oracle database server versions 7.3, 8.0, and 8i.
Here is the table’s structure:
Let’s start out by updating employee James (Empno = 7900) and setting his salary to NULL:
Name Null? Type———– ——– ————EMPNO NOT NULL NUMBER(4)ENAME VARCHAR2(10)JOB VARCHAR2(9)MGR NUMBER(4)HIREDATE DATESAL NUMBER(7,2)COMM NUMBER(7,2)DEPTNO NUMBER(2)
UPDATE Emp SET Sal = NULL WHERE Empno = 7900; COMMIT;
Now we’ll look at the data in Emp table with 14 rows:
Listing 1 shows the output of the above code.Using the ROWNUM Pseudocolumn
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM Emp;
One way to solve this problem is by using the Oracle pseudocolumn ROWNUM. ROWNUM is available in Oracle versions 7, 8.0 and 8i (8.1); I am not sure about Oracle 6. For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.
You can use the ROWNUM pseudocolumn to limit the number of rows returned by a query to 5:
Listing 2 shows the output of the above code. As you see, the first five rows have been returned.
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM Emp WHERE ROWNUM < 6;
If an ORDER BY clause follows ROWNUM in the same query, the rows will be reordered by the ORDER BY clause.
Listing 3 shows the output of the above code.
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM Emp WHERE ROWNUM < 6 ORDER BY Sal;
Watch Out! Because the ROWNUM is assigned upon retrieval, it is assigned prior to any sorting! This is opposite to the result you would get in SQL Server using the TOP clause. In order to select employees with the highest five salaries, you have to force sorting and then apply ROWNUM condition. Here is the syntax for a top-N query where N = 5 (this syntax with the subquery works only in Oracle 8i):
Listing 4 shows the output of the above code. I used the NVL() function to sort the expression because sorting just by Emp_Salary would have put all records with NULL salary before those with the highest salaries, and that’s not what we wanted to achieve.Utilizing Oracle 8i’s Ranking Functions
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM Emp ORDER BY NVL(Sal, 0) DESC) WHERE ROWNUM < 6;
Another way to perform a top-N query uses the new Oracle 8i feature called “analytic functions.” The SQL language, while extremely capable in many areas, has never provided strong support for analytic tasks, such as computing rankings, cumulative and moving averages, lead/lag comparisons, and reporting. These tasks require extensive PL/SQL programming, often with performance issues. Oracle 8i now provides a new wide set of analytic functions that address this need.
For a top-N query you can use two ranking functions: RANK and DENSE_RANK. Both allow you to rank items in a group?for example, finding the top-five employees by salary, which is exactly what we need to achieve.
The difference between RANK() and DENSE_RANK() is that RANK() leaves gaps in the ranking sequence when there are ties. In our case, Scott and Ford tie for second place with a $3,000 salary; Jones’ $2,975 salary brings him in third place using DENSE_RANK() but only fourth place using RANK():
Listing 5 shows the output of the above code.
SELECT Empno, Ename, Job, Mgr, Sal, RANK() OVER (ORDER BY SAL Desc NULLS LAST) AS Rank, DENSE_RANK() OVER (ORDER BY SAL Desc NULLS LAST) AS Drank FROM Emp ORDER BY SAL Desc NULLS LAST;
The NULLS FIRST | NULLS LAST clause determines the position of rows with NULL values in the ordered query.
If the sequence is in descending order, then NULLS LAST implies that NULL values are smaller than non-NULL ones and rows with NULLs will appear at the bottom of the list. If the NULLS FIRST | NULLS LAST clause is omitted, then NULL values are considered larger than any other values and their ordering position depends on the ASC | DESC arguments.
If the ordering sequence is ascending (ASC), then rows with NULLs will appear last; if the sequence is descending (DESC), then rows with NULLs will appear first. NULLs are considered equal to other NULLs and, therefore, the order in which rows with NULLs are presented is nondeterministic.Using RANK() to Obtain a Top-N Query
To obtain a top-N query, use RANK() in a subquery and then apply a filter condition outside the subquery:
Listing 6 shows the output of the above code.
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal, RANK() OVER (ORDER BY SAL Desc NULLS LAST) AS Emp_Rank FROM Emp ORDER BY SAL Desc NULLS LAST) WHERE Emp_Rank < 6;
Using the same technique, you can retrieve the bottom-five employees by salary:
Listing 7 shows the output of the above code.
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal, RANK() OVER (ORDER BY SAL ASC NULLS FIRST) AS Emp_Rank FROM Emp ORDER BY SAL ASC NULLS FIRST) WHERE Emp_Rank < 6;
Ranking functions can be used to operate within groups, too?that is, the rank value gets reset whenever the group changes. This is achieved with a PARTION BY subclause. Here is the syntax to retrieve the top employee by salary per manager group:
Listing 8 shows the output of the above code.
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal, RANK() OVER (PARTITION BY MGR ORDER BY MGR, SAL DESC NULLS LAST) AS Emp_Rank FROM Emp ORDER BY MGR, SAL DESC NULLS LAST) WHERE Emp_Rank = 1;
As you can see, analytic functions are extremely useful in all types of analysis and computations, and they provide substantial SQL extensions to Oracle 8i.