Calculating Average Number of Employees in Each Department

Calculating Average Number of Employees in Each Department

Question:
[Joe Celko’s Personnel Problem Puzzle]

Daren Race was trying to aggregate the results from an aggregate result set using Gupta’s SQLBase and could not think of any way other than using a temporary table or a VIEW. This is an example of what he was doing:

    Personnel:    name  deptid     ===============    Daren  Acct    Joe    Acct    Lisa   DP    Helen  DP    Fonda  DP
Then he viewed the data as a aggregate by deptid:
    SELECT deptid, COUNT(*)         FROM Personnel        GROUP BY deptid;
The results will be:
    Result    deptid COUNT(*)     ===============    Acct    2     DP      3
Then he wanted to find the average department size! The way he did this was to use a VIEW:
    CREATE VIEW DeptView (deptid, tally)         AS SELECT deptid, COUNT(*)             FROM Personnel             GROUP BY deptid;
Then:
    SELECT AVG(tally) FROM DeptView;
He asked if anyone on the Gupta Forum on CompuServe could think of a way of doing this without using temporary tables (or views)? The two answers he got were:
    SELECT AVG(DISTINCT deptid)         FROM Personnel;
and
    SELECT COUNT(*) / COUNT(DISTINCT deptid)         FROM Personnel;
Your problem is to tell me what is wrong with each of them.

Answer:
The first answer will grab the department numbers, throw away NULLs (there should not be any in this case), then throw away duplicates and average what is left. This has nothing to do with the number of people in each department; we will get (1+2/2) = 1.5 for an answer.

The second answer is really much better and will give us the right results for this data. We have a COUNT(*) = 5, and COUNT(DISTINCT deptid) = 2, so the answer we get is (2.5) just as we wished.

But now we hire three new employees, Larry, Moe and Curly, who are not yet assigned to a department and our tables looks like this:

    Personnel:    name  deptid     ===============    Daren  Acct    Joe    Acct    Lisa   DP    Helen  DP    Fonda  DP    Larry  NULL     Moe    NULL    Curly  NULL  
We now have a COUNT(*) = 8, but COUNT(DISTINCT deptid) = 2 because it drops NULLs, so the answer we get is (4). The real answer is that we cannot determine an exact value, but we know that it is between (8/5 = 1.60) and (8/3 = 2.66) depending what we do with Larry, Moe and Curly.

If Mr. Race had stuck to his original method, we would have gotten:

    Result    deptid COUNT(*)     ===============    Acct    2     DP      3    NULL    3
and a final result of 1.5 as before because the NULLs would form a group by themselves in the VIEW, but then been dropped out by the average in the final query.

Puzzle provided courtesy of:
Joe Celko
[email protected]

Share the Post:
XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing

data dictionary

Tools You Need to Make a Data Dictionary

Data dictionaries are crucial for organizations of all sizes that deal with large amounts of data. they are centralized repositories of all the data in organizations, including metadata such as