Performing Top-N Queries in Oracle

he task of retrieving the top or bottom N rows from a database table (by salary, sales amount, credit, etc.) is often referred to as a “top-N query.” This task is fairly common in application development.

The most straightforward, but inefficient, way of accomplishing such a query is by retrieving all rows from the database table(s), sorting them by specified criteria, scanning from the top, and then selecting the top N rows. This is definitely not an elegant solution.



Writing a procedure that retrieves and sorts all rows from a database table requires lots of code, and it probably won’t execute as quickly as you’d expect.



Use a single SQL statement to perform a top-N query. You can do so either by using the ROWNUM pseudocolumn available in several versions of Oracle or by utilizing new analytic functions available in Oracle 8i: RANK() and DENSE_RANK().

Setting Up the Table
In this 10-Minute Solution, I am going to use table EMP in the queries examples of this article. This table belongs to the SCOTT schema of the Oracle starter database provided with Oracle database server versions 7.3, 8.0, and 8i.

Here is the table’s structure:

Name         Null?     Type———–  ——–  ————EMPNO        NOT NULL  NUMBER(4)ENAME                  VARCHAR2(10)JOB                    VARCHAR2(9)MGR                    NUMBER(4)HIREDATE               DATESAL                    NUMBER(7,2)COMM                   NUMBER(7,2)DEPTNO                 NUMBER(2)
Let’s start out by updating employee James (Empno = 7900) and setting his salary to NULL:
UPDATE Emp   SET Sal = NULL   WHERE Empno = 7900;   COMMIT;

Now we’ll look at the data in Emp table with 14 rows:

SELECT Empno, Ename, Job, Mgr, Hiredate, Sal   FROM Emp;
Listing 1 shows the output of the above code.Using the ROWNUM Pseudocolumn
One way to solve this problem is by using the Oracle pseudocolumn ROWNUM. ROWNUM is available in Oracle versions 7, 8.0 and 8i (8.1); I am not sure about Oracle 6. For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

You can use the ROWNUM pseudocolumn to limit the number of rows returned by a query to 5:

SELECT Empno, Ename, Job, Mgr, Hiredate, Sal    FROM Emp    WHERE ROWNUM < 6;
Listing 2 shows the output of the above code. As you see, the first five rows have been returned.

If an ORDER BY clause follows ROWNUM in the same query, the rows will be reordered by the ORDER BY clause.

SELECT Empno, Ename, Job, Mgr, Hiredate, Sal    FROM Emp    WHERE ROWNUM < 6    ORDER BY Sal;
Listing 3 shows the output of the above code.

Watch Out! Because the ROWNUM is assigned upon retrieval, it is assigned prior to any sorting! This is opposite to the result you would get in SQL Server using the TOP clause. In order to select employees with the highest five salaries, you have to force sorting and then apply ROWNUM condition. Here is the syntax for a top-N query where N = 5 (this syntax with the subquery works only in Oracle 8i):

SELECT Empno, Ename, Job, Mgr, Hiredate, Sal   FROM   (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal      FROM Emp      ORDER BY NVL(Sal, 0) DESC)   WHERE ROWNUM < 6;
Listing 4 shows the output of the above code. I used the NVL() function to sort the expression because sorting just by Emp_Salary would have put all records with NULL salary before those with the highest salaries, and that’s not what we wanted to achieve.Utilizing Oracle 8i’s Ranking Functions
Another way to perform a top-N query uses the new Oracle 8i feature called “analytic functions.” The SQL language, while extremely capable in many areas, has never provided strong support for analytic tasks, such as computing rankings, cumulative and moving averages, lead/lag comparisons, and reporting. These tasks require extensive PL/SQL programming, often with performance issues. Oracle 8i now provides a new wide set of analytic functions that address this need.

For a top-N query you can use two ranking functions: RANK and DENSE_RANK. Both allow you to rank items in a group?for example, finding the top-five employees by salary, which is exactly what we need to achieve.

The difference between RANK() and DENSE_RANK() is that RANK() leaves gaps in the ranking sequence when there are ties. In our case, Scott and Ford tie for second place with a $3,000 salary; Jones’ $2,975 salary brings him in third place using DENSE_RANK() but only fourth place using RANK():

SELECT Empno, Ename, Job, Mgr, Sal,   RANK() OVER      (ORDER BY SAL Desc NULLS LAST) AS Rank,   DENSE_RANK() OVER      (ORDER BY SAL Desc NULLS LAST) AS Drank   FROM Emp   ORDER BY SAL Desc NULLS LAST;
Listing 5 shows the output of the above code.

The NULLS FIRST | NULLS LAST clause determines the position of rows with NULL values in the ordered query.

If the sequence is in descending order, then NULLS LAST implies that NULL values are smaller than non-NULL ones and rows with NULLs will appear at the bottom of the list. If the NULLS FIRST | NULLS LAST clause is omitted, then NULL values are considered larger than any other values and their ordering position depends on the ASC | DESC arguments.

If the ordering sequence is ascending (ASC), then rows with NULLs will appear last; if the sequence is descending (DESC), then rows with NULLs will appear first. NULLs are considered equal to other NULLs and, therefore, the order in which rows with NULLs are presented is nondeterministic.

Using RANK() to Obtain a Top-N Query
To obtain a top-N query, use RANK() in a subquery and then apply a filter condition outside the subquery:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal   FROM   (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,      RANK() OVER         (ORDER BY SAL Desc NULLS LAST) AS Emp_Rank      FROM Emp      ORDER BY SAL Desc NULLS LAST)   WHERE Emp_Rank < 6;
Listing 6 shows the output of the above code.

Using the same technique, you can retrieve the bottom-five employees by salary:

SELECT Empno, Ename, Job, Mgr, Hiredate, Sal   FROM   (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,      RANK() OVER         (ORDER BY SAL ASC NULLS FIRST) AS Emp_Rank      FROM Emp      ORDER BY SAL ASC NULLS FIRST)   WHERE Emp_Rank < 6;
Listing 7 shows the output of the above code.

Ranking functions can be used to operate within groups, too?that is, the rank value gets reset whenever the group changes. This is achieved with a PARTION BY subclause. Here is the syntax to retrieve the top employee by salary per manager group:

SELECT Empno, Ename, Job, Mgr, Hiredate, Sal   FROM   (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,      RANK() OVER         (PARTITION BY MGR ORDER BY MGR, SAL DESC NULLS LAST) AS Emp_Rank      FROM Emp      ORDER BY MGR, SAL DESC NULLS LAST)   WHERE Emp_Rank = 1;
Listing 8 shows the output of the above code.

As you can see, analytic functions are extremely useful in all types of analysis and computations, and they provide substantial SQL extensions to Oracle 8i.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: