  Advertiser Disclosure
 TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK Specialized Dev Zones Research Center eBook Library .NET Java C++ Web Dev Architecture Database Security Open Source Enterprise Mobile Special Reports 10-Minute Solutions DevXtra Blogs Slideshow      Author Feedback Print Article Comment on this Article # Calculate Percentiles with SQL Server 2005

## Learn one of the techniques for calculating percentiles with SQL Server 2005 using the new SQL Server Common Table Expression and the latest ROW_NUMBER function.

 by Cecil Lew
 Oct 2, 2007
 Page 1 of 2 ### WEBINAR:On-Demand

Building the Right Environment to Support AI, Machine Learning and Deep Learning hen you need to analyze numerical data, percentiles are handy for understanding the distribution. Oracle has been providing analytical functions such as NTILE, PERCENTILE_CONT, and RANK since version 8, while SQL Server has been playing catch up. With SQL Server 2005, Microsoft finally added NTILE, RANK, DENSE_RANK, and ROW_NUMBER to SQL Server's T-SQL functions. Unfortunately, the SQL Server 2005 still does not provide PERCENTILE_CONT, and neither will SQL Server 2008. They do, however, feature the new SQL Server Common Table Expression (CTE), which you can use to calculate percentiles.

This article presents a solution for calculating percentiles in SQL Server 2005 using SQL Server CTE. Based on a solution described in Joe Celko's SQL For Smarties, the idea is to calculate the percentile (percent rank) of each row; if one of the rows has the percentile value that you are looking for, you return it. Otherwise, you perform an interpolation between a value that is slightly more than the desired percentile and one that is slightly less (see Figure 1).

 Figure 1. Calculating the Percentiles Based on Ranking:You interpolate between a value that is slightly more than the desired percentile and one that is slightly less.

Calculating Percentiles with SQL Server CTE
This article's percentiles calculations are based on this fictitious table of employee salaries:

``````
CREATE TABLE employee
(
ID INT IDENTITY(1,1) PRIMARY KEY,
dept VARCHAR(30),
emp_name VARCHAR(80),
salary FLOAT
)

INSERT INTO employee(dept, emp_name, salary) values('PRODUCTION', 'Janice', 2000)
INSERT INTO employee(dept, emp_name, salary) values('PRODUCTION', 'Allan', 3500)
INSERT INTO employee(dept, emp_name, salary) values('PRODUCTION', 'Peter', 3300)
INSERT INTO employee(dept, emp_name, salary) values('PRODUCTION', 'Sam', 1100)
INSERT INTO employee(dept, emp_name, salary) values('SALES', 'Jerry', 2300)
INSERT INTO employee(dept, emp_name, salary) values('SALES', 'Robert', 5540)
INSERT INTO employee(dept, emp_name, salary) values('SALES', 'Larry', 4200)
INSERT INTO employee(dept, emp_name, salary) values('SALES', 'Gordon', 2200)
INSERT INTO employee(dept, emp_name, salary) values('SALES', 'Larry', 1120)
INSERT INTO employee(dept, emp_name, salary) values('SALES', 'Jane', 4120)
INSERT INTO employee(dept, emp_name, salary) values('MARKETING', 'Holly', 1300)
INSERT INTO employee(dept, emp_name, salary) values('MARKETING', 'John', 1000)
INSERT INTO employee(dept, emp_name, salary) values('MARKETING', 'Hunter', 6330)
INSERT INTO employee(dept, emp_name, salary) values('MARKETING', 'Yoda', 5500)
INSERT INTO employee(dept, emp_name, salary) values('MARKETING', 'Jackie', 2280)
INSERT INTO employee(dept, emp_name, salary) values('MARKETING', 'Rachel', 1490)
INSERT INTO employee(dept, emp_name, salary) values('ACCOUNTING', 'Seymour', 990)
INSERT INTO employee(dept, emp_name, salary) values('ACCOUNTING', 'Yoda', 2930)
INSERT INTO employee(dept, emp_name, salary) values('ACCOUNTING', 'Dave', 5400)
GO
``````

Here is the code to calculate the 75th percentile of employee salaries using the above table data:

``````
DECLARE @percentile FLOAT
SELECT @percentile = .75;

WITH emp_sal(salary, prev_rank, curr_rank, next_rank) AS
(
SELECT salary,
(ROW_NUMBER() OVER ( ORDER BY salary ) - 2.0) / ((SELECT COUNT(*) FROM employee) - 1)  [prev_rank],
(ROW_NUMBER() OVER ( ORDER BY salary ) - 1.0) / ((SELECT COUNT(*) FROM employee) - 1)  [curr_rank],
(ROW_NUMBER() OVER ( ORDER BY salary ) + 0.0) / ((SELECT COUNT(*) FROM employee) - 1)  [next_rank]
FROM employee
)
SELECT
CASE
WHEN t1.salary = t2.salary THEN t1.salary
ELSE t1.salary + (t2.salary - t1.salary) * ((@percentile - t1.curr_rank) /  (t2.curr_rank - t1.curr_rank))
END
FROM emp_sal t1, emp_sal t2
WHERE (t1.curr_rank = @percentile OR (t1.curr_rank < @percentile AND t1.next_rank > @percentile))
AND (t2.curr_rank = @percentile OR (t2.curr_rank > @percentile AND t2.prev_rank < @percentile))
``````

This query uses SQL Server CTE along with the new ROW_NUMBER() function. The emp_sal CTE at the top produces a table that ranks each row. The prev_rank and next_rank columns help locate a row for interpolation when the desired percentile does not fall onto one of the rows (see Figure 2).

 Figure 2. Salary Values and Ranking: The two highlighted rows will be used to calculate the 75th percentile.

The SELECT in the main body of the query extracts the two closest values and performs an interpolation. In this case, the formula that calculates the 75th percentile (using the values 4120 and 4200) is as follows:

``````
75th percentile = 4120 + (4200 - 4120) * (0.75 - 0.7222) / (0.7778 - 0.7222) = 4160
`````` Author Feedback Email Article Print Article Comment on this Article   Thanks for your registration, follow us on our social networks to keep up-to-date