# Calculating Average Number of Employees in Each Department

Question:
[Joe Celko’s Personnel Problem Puzzle]

Daren Race was trying to aggregate the results from an aggregate result set using Gupta’s SQLBase and could not think of any way other than using a temporary table or a VIEW. This is an example of what he was doing:

`    Personnel:    name  deptid     ===============    Daren  Acct    Joe    Acct    Lisa   DP    Helen  DP    Fonda  DP`
Then he viewed the data as a aggregate by deptid:
`    SELECT deptid, COUNT(*)         FROM Personnel        GROUP BY deptid;`
The results will be:
`    Result    deptid COUNT(*)     ===============    Acct    2     DP      3`
Then he wanted to find the average department size! The way he did this was to use a VIEW:
`    CREATE VIEW DeptView (deptid, tally)         AS SELECT deptid, COUNT(*)             FROM Personnel             GROUP BY deptid;`
Then:
`    SELECT AVG(tally) FROM DeptView;`
He asked if anyone on the Gupta Forum on CompuServe could think of a way of doing this without using temporary tables (or views)? The two answers he got were:
`    SELECT AVG(DISTINCT deptid)         FROM Personnel;`
and
`    SELECT COUNT(*) / COUNT(DISTINCT deptid)         FROM Personnel;`
Your problem is to tell me what is wrong with each of them.

The first answer will grab the department numbers, throw away NULLs (there should not be any in this case), then throw away duplicates and average what is left. This has nothing to do with the number of people in each department; we will get (1+2/2) = 1.5 for an answer.

The second answer is really much better and will give us the right results for this data. We have a COUNT(*) = 5, and COUNT(DISTINCT deptid) = 2, so the answer we get is (2.5) just as we wished.

See also  How to Create and Deploy QR Codes Online: A Comprehensive Guide

But now we hire three new employees, Larry, Moe and Curly, who are not yet assigned to a department and our tables looks like this:

`    Personnel:    name  deptid     ===============    Daren  Acct    Joe    Acct    Lisa   DP    Helen  DP    Fonda  DP    Larry  NULL     Moe    NULL    Curly  NULL  `
We now have a COUNT(*) = 8, but COUNT(DISTINCT deptid) = 2 because it drops NULLs, so the answer we get is (4). The real answer is that we cannot determine an exact value, but we know that it is between (8/5 = 1.60) and (8/3 = 2.66) depending what we do with Larry, Moe and Curly.

If Mr. Race had stuck to his original method, we would have gotten:

`    Result    deptid COUNT(*)     ===============    Acct    2     DP      3    NULL    3`
and a final result of 1.5 as before because the NULLs would form a group by themselves in the VIEW, but then been dropped out by the average in the final query.

Puzzle provided courtesy of:
Joe Celko
[email protected]

## About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.