[Joe Celko’s Personnel Problem Puzzle]
Daren Race was trying to aggregate the results from an aggregate result set using Gupta’s SQLBase and could not think of any way other than using a temporary table or a VIEW. This is an example of what he was doing:
Personnel: name deptid =============== Daren Acct Joe Acct Lisa DP Helen DP Fonda DPThen he viewed the data as a aggregate by deptid:
SELECT deptid, COUNT(*) FROM Personnel GROUP BY deptid;The results will be:
Result deptid COUNT(*) =============== Acct 2 DP 3Then he wanted to find the average department size! The way he did this was to use a VIEW:
CREATE VIEW DeptView (deptid, tally) AS SELECT deptid, COUNT(*) FROM Personnel GROUP BY deptid;Then:
SELECT AVG(tally) FROM DeptView;He asked if anyone on the Gupta Forum on CompuServe could think of a way of doing this without using temporary tables (or views)? The two answers he got were:
SELECT AVG(DISTINCT deptid) FROM Personnel;and
SELECT COUNT(*) / COUNT(DISTINCT deptid) FROM Personnel;Your problem is to tell me what is wrong with each of them.
The first answer will grab the department numbers, throw away NULLs (there should not be any in this case), then throw away duplicates and average what is left. This has nothing to do with the number of people in each department; we will get (1+2/2) = 1.5 for an answer.
The second answer is really much better and will give us the right results for this data. We have a COUNT(*) = 5, and COUNT(DISTINCT deptid) = 2, so the answer we get is (2.5) just as we wished.
But now we hire three new employees, Larry, Moe and Curly, who are not yet assigned to a department and our tables looks like this:
Personnel: name deptid =============== Daren Acct Joe Acct Lisa DP Helen DP Fonda DP Larry NULL Moe NULL Curly NULLWe now have a COUNT(*) = 8, but COUNT(DISTINCT deptid) = 2 because it drops NULLs, so the answer we get is (4). The real answer is that we cannot determine an exact value, but we know that it is between (8/5 = 1.60) and (8/3 = 2.66) depending what we do with Larry, Moe and Curly.
If Mr. Race had stuck to his original method, we would have gotten:
Result deptid COUNT(*) =============== Acct 2 DP 3 NULL 3and a final result of 1.5 as before because the NULLs would form a group by themselves in the VIEW, but then been dropped out by the average in the final query.
Puzzle provided courtesy of: