devxlogo

Nested Outer Joins

Nested Outer Joins

Question:
[Joe Celko’s Outer (JOIN) Limits Puzzle]

Gerard Manko at ARI had a problem in 1994 April. They had just switched over from Paradox to Watcom SQL. The conversion of the legacy database was done on a one-for-one basis, without any thought of normalization. Yes, I know that as the SQL Guru, I should have sent him to whatever ring of hell we are reserving for people who don’t normalize. But that does not get the job done, and it is something we find in the real world all the time.

The system tracks teams of employees to work on jobs. Each job has a slot for a primary and a slot for an optional assistant mechanic. The tables involved look like this:

    CREATE TABLE Jobs        (jobid INTEGER NOT NULL PRIMARY KEY,        startdate DATE NOT NULL,        … );    CREATE TABLE Employees        (empid INTEGER NOT NULL PRIMARY KEY        name CHAR (20) NOT NULL,        …);    CREATE TABLE Teams        (jobid INTEGER NOT NULL,        mechslot INTEGER NOT NULL,        empid INTEGER NOT NULL,        … );
Your first task is just to add some integrity checking to the Teams table. We will not worry about normalization, since a lot of the procedures depend on having these tables set up as they are.

What you want to do is build a query for a report that lists all the jobs by jobid, the primary mechanic (if any), and the assistant mechanic (if any). Hint: you can get the jobids from Jobs because that table has all of the current jobs, while the Teams table lists only those jobs for which a team has been assigned.

See also  Why ChatGPT Is So Important Today

Answer:
The first problem is to add a little referential integrity. The Teams table should probably be tied to the others with foreign key references and it is always a good idea to check the codes in the database schema, like so:

    CREATE TABLE Teams        (jobid INTEGER NOT NULL REFERENCES Jobs(jobid),        mechslot INTEGER NOT NULL            CHECK (mechslot IN (‘Primary’, ‘Assistant’)),        empid INTEGER NOT NULL REFERENCES Employees(empid),        …);
The trick in this problem is that experienced SQL people will immediately think of using a LEFT OUTER JOIN because if they wanted to get just the primary mechanics, they would write:
    SELECT Jobs.jobid, Teams.empid AS Primary        FROM (Jobs LEFT OUTER JOIN Teams            ON Jobs.jobid = Teams.jobid)        WHERE Teams.mechslot = ‘Primary’;
Or realizing that Jobs is the preserved table in the left outer join:
    SELECT Jobs.jobid, Teams.empid AS Primary        FROM (Jobs LEFT OUTER JOIN Teams            ON (Jobs.jobid = Teams.jobid                AND Teams.mechslot = ‘Primary’));
The reader might want to verify for himself that if the Teams table were the preserved table, this transform would not work.

Or if you wanted to get the employees by name, you could nest the joins deeper:

    SELECT Jobs.jobid, Employees.name AS Primary        FROM ((Jobs LEFT OUTER JOIN Teams            ON Jobs.jobid = Teams.jobid)         AS P1 LEFT OUTER JOIN Employees             ON P1.empid = Employees.empid)        WHERE P1.mechslot = ‘Primary’;
The problem here is that you want to do two independent outer joins for each mechanic’s slot on a team and put the results in one table. It is probably possible to build a horrible, deeply nested self outer join all in one SELECT statement. But you could not read it or understand it. You could do the report with VIEWs for primary and assistant mechanics as shown, then put them together. But you can avoid all of this mess with:
    SELECT Jobs.jobid,        (SELECT empid FROM Teams            WHERE Jobs.jobid = Teams.jobid                AND Teams.mechslot = ‘Primary’) AS Primary,        (SELECT empid FROM Teams            WHERE Jobs.jobid = Teams.jobid                AND Teams.mechslot = ‘Assistant’) AS Assistant        FROM Jobs;
The neat SQL-92 trick is the ability to nest two independent scalar SELECT statements in the outermost SELECT. To add the employee’s name, just change the innermost SELECT statements.
    SELECT Jobs.jobid,        (SELECT name FROM Teams, Employees            WHERE Jobs.jobid = Teams.jobid                AND Employees.empid = Teams.empid                AND Teams.mechslot = ‘Primary’) AS Primary,        (SELECT name FROM Teams, Employees            WHERE Jobs.jobid = Teams.jobid                AND Employees.empid = Teams.empid                AND Teams.mechslot = ‘Assistant’) AS Assistant        FROM Jobs;
If you have a guy acting as both primary and assistant mechanic on a single job, then you will get him in both slots. If you have two or more primary mechanics, or you have two or more assistant mechanics on a job, then you will get an error. If you have no primary or assistant mechanics, then you will get an empty SELECT result and that becomes a NULL. That gives you the outer joins you wanted.
See also  Professionalism Starts in Your Inbox: Keys to Presenting Your Best Self in Email

Puzzle provided courtesy of:
Joe Celko
[email protected]

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist