Outer Joins for Reporting Data Stored in Different Rows

Question:
[Joe Celko’s Phone Book Puzzle]

You are trying to set up an office telephone directory and have the following tables:

    CREATE TABLE Employees        (empid INTEGER PRIMARY KEY,        firstname CHAR(20) NOT NULL,        lastname CHAR(20) NOT NULL);    INSERT INTO Employees VALUES (01, ‘John’, ‘Smith’);    INSERT INTO Employees VALUES (02, ‘Fred’, ‘Jones’);    INSERT INTO Employees VALUES (03, ‘Bill’, ‘Black’);    INSERT INTO Employees VALUES (04, ‘Joe’, ‘Celko’);    CREATE TABLE Phones        (empid INTEGER NOT NULL,        phonetype CHAR(3) NOT NULL CHECK phonetype IN (‘hom’, ‘fax’),        phonenumber CHAR(12) NOT NULL,        PRIMARY KEY (empid, phonetype, phonenumber),        FOREIGN KEY empid REFERENCES Employees.empid );    INSERT INTO Phones VALUES (01, ‘hom’, ‘213-555-0101’);    INSERT INTO Phones VALUES (01, ‘fax’, ‘213-555-0101’);    INSERT INTO Phones VALUES (02, ‘hom’, ‘213-555-0202’);    INSERT INTO Phones VALUES (03, ‘fax’, ‘213-555-0303’);
The codes ‘hom’ and ‘fax’ tell you if the number is their home phone number or if it is their FAX number. What you want to print out is a report with one line per employee, giving both numbers and showing a NULL if either or both are missing.

It is worth noting that the FOREIGN KEY constraint on the Phones table means that you cannot list a telephone number for someone who is not an employee. The PRIMARY KEY looks a bit large, until you stop and think about all the cases. Employees could share the same FAX or home telephones and a single line could be both voice and FAX services.

Answer:
There are a lot of bad starts to this query. The first thought is to construct the home telephone information as a query in its own right. Since we want to see all the employees, we need an outer join:

    CREATE VIEW Hphones (lastname, firstname, empid, homenumber)        AS SELECT E1.lastname, E1.firstname,             E1.empid, H1.phonenumber        FROM (Employees AS E1 LEFT OUTER JOIN Phones AS H1            ON (E1.empid = H1.empid AND H1.phonetype = ‘hom’));
Likewise, we could construct the FAX telephone information as a query the same way:
    CREATE VIEW Fphones (lastname, firstname, empid, faxnumber)        AS SELECT E1.lastname, E1.empid, F1.phonenumber        FROM (Employees AS E1 LEFT OUTER JOIN Phones AS F1            ON (E1.empid = F1.empid AND F1.phonetype = ‘fax’));
It would seem reasonable to combine these two VIEWs and get the results.
    SELECT Hphones.lastname,             Hphones.firstname,             homenumber, faxnumber        FROM Hphones AS H1, Fphones AS F1        WHERE H1.empid = F1.empid;
But this does not work because it leaves out the “FAX only” people. If we want to preserve both tables, then we need a FULL OUTER JOIN, which might look like this:
    SELECT H1.lastname, H1.firstname, homenumber, faxnumber        FROM (Hphones AS H1 FULL OUTER JOIN Fphones AS F1            ON H1.empid = F1.empid);
But this still does not print the names of the “FAX only” people ? they show up as NULLs since we are only printing the Hphones people names. The COALESCE() function will take care of that problem for us, like this:
    SELECT COALESCE (H1.lastname, F1.lastname),             COALESCE (H1.firstname, F1.firstname),             homenumber, faxnumber        FROM (Hphones AS H1 FULL OUTER JOIN Fphones AS F1            ON H1.empid = F1.empid);
The bad news is that this will work. It will also run like glue since it will probably materialize the VIEWs before using them.

The real trick is to go back and see that the Fphones and Hphones VIEWs are outer joined to the Employees table. The Employees table can be factored out and the two FROM clauses combined to give:

    SELECT E1.lastname, E1.firstname,            H1.phonenumber AS Home,             F1.phonenumber AS FAX        FROM ((Employees AS E1 LEFT OUTER JOIN Phones AS H1            ON (E1.empid = H1.empid AND H1.phonetype = ‘hom’))        LEFT OUTER JOIN Phones AS F1            ON (E1.empid = F1.empid AND F1.phonetype = ‘fax’));
Since this gets all the tables at once, it should run a good bit faster than the false start.

Puzzle provided courtesy of:
Joe Celko
[email protected]

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: