TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
 Specialized Dev Zones Research Center eBook Library .NET Java C++ Web Dev Architecture Database Security Open Source Enterprise Mobile Special Reports 10-Minute Solutions DevXtra Blogs Slideshow

 Home » Tip Bank » Database Development » SQL » Transact-sql
Language: SQL Server
Expertise: Beginner
Mar 24, 1997

### WEBINAR:On-Demand

Building the Right Environment to Support AI, Machine Learning and Deep Learning

# Calculating Average Number of Employees in Each Department

Question:
[Joe Celko's Personnel Problem Puzzle]

Daren Race was trying to aggregate the results from an aggregate result set using Gupta's SQLBase and could not think of any way other than using a temporary table or a VIEW. This is an example of what he was doing:

```    Personnel:
name  deptid
===============
Daren  Acct
Joe    Acct
Lisa   DP
Helen  DP
Fonda  DP
```
Then he viewed the data as a aggregate by deptid:
```    SELECT deptid, COUNT(*)
FROM Personnel
GROUP BY deptid;
```
The results will be:
```    Result
deptid COUNT(*)
===============
Acct    2
DP      3
```
Then he wanted to find the average department size! The way he did this was to use a VIEW:
```    CREATE VIEW DeptView (deptid, tally)
AS SELECT deptid, COUNT(*)
FROM Personnel
GROUP BY deptid;
```
Then:
```    SELECT AVG(tally) FROM DeptView;
```
He asked if anyone on the Gupta Forum on CompuServe could think of a way of doing this without using temporary tables (or views)? The two answers he got were:
```    SELECT AVG(DISTINCT deptid)
FROM Personnel;
```
and
```    SELECT COUNT(*) / COUNT(DISTINCT deptid)
FROM Personnel;
```
Your problem is to tell me what is wrong with each of them.

The first answer will grab the department numbers, throw away NULLs (there should not be any in this case), then throw away duplicates and average what is left. This has nothing to do with the number of people in each department; we will get (1+2/2) = 1.5 for an answer.

The second answer is really much better and will give us the right results for this data. We have a COUNT(*) = 5, and COUNT(DISTINCT deptid) = 2, so the answer we get is (2.5) just as we wished.

But now we hire three new employees, Larry, Moe and Curly, who are not yet assigned to a department and our tables looks like this:

```    Personnel:
name  deptid
===============
Daren  Acct
Joe    Acct
Lisa   DP
Helen  DP
Fonda  DP
Larry  NULL
Moe    NULL
Curly  NULL
```
We now have a COUNT(*) = 8, but COUNT(DISTINCT deptid) = 2 because it drops NULLs, so the answer we get is (4). The real answer is that we cannot determine an exact value, but we know that it is between (8/5 = 1.60) and (8/3 = 2.66) depending what we do with Larry, Moe and Curly.

If Mr. Race had stuck to his original method, we would have gotten:

```    Result
deptid COUNT(*)
===============
Acct    2
DP      3
NULL    3
```
and a final result of 1.5 as before because the NULLs would form a group by themselves in the VIEW, but then been dropped out by the average in the final query.

Puzzle provided courtesy of:
Joe Celko
71062.1056@compuserve.com

DevX Pro

 Submit a Tip Browse "Database Development" Tips Browse All Tips
Comment and Contribute

(Maximum characters: 1200). You have 1200 characters left.

Thanks for your registration, follow us on our social networks to keep up-to-date