he cornerstone of the relational database system is its multiple tables and the ability to associate those tables with one another so you can combine data from different but related tables. Most of the time these associations are intuitive and work just as you expect. However, problems can arise when one of these associations returns unexpected results. In other words, the records you get, aren’t the records you know you should be seeing.
Fortunately, you can dynamically relate different tables by applying what’s known as a join. Technically, a join is the operation used when selecting data to create these relationships at retrieval time. What that means to you is that a join determines which records are selected and which aren’t. In this article, we’ll introduce you to several types of joins supported by SQL and show you how to correctly apply them to get the data you need.
Technically, a join is an operation (it may be easier to think of this operation as a rule) that defines a relationship to determine which records are retrieved or acted upon. What that means is that a join is conditional?similar to a WHERE clause or criteria expression?in that the join specifies which records (rows) are selected in both tables. Certainly, the type of join significantly impacts which records are retrieved or acted upon.
Before we get started, we should clarify that this article is about SQL joins, not about the way in which particular systems support SQL joins. Although, we will discuss actual examples, for syntax purposes, we won’t delve into the many nuances of each version-specific SQL. Examples will include the most basic syntax possible for simplicity’s sake. For instance, we won’t show alias syntax because they’re all different?those details are better left to articles that deals with version-specific SQL information.
For the most part, a join can be specified in a FROM or WHERE clause, so be careful where you specify them. Because the data engine executes the clauses in a specific sequence, placement can affect the results. Use the following guidelines to determine the best place for a particular join, although most of the time, the position won’t make any difference:
- Generally, the data engine fulfills the FROM clause before applying any other clauses. Adding the join here eliminates records before they’re evaluated by any other clauses. As a rule, you’ll find this is the recommended method.
- Any records that make it past the FROM clause arethen evaluated by an implicit join condition specified in the WHERE clause (if any).
To specify a join in the FROM clause, use the form:
SELECT fldlist | *FROM table1 join table2 [ON (condition)]
Use it where table1 and table2 represent the related tables, join is the type of join applied to the relationship, and condition specifies the columns you’re comparing to restrict the query’s results. Usually that condition is a simple expression that compares the primary and foreign key values in the form:
ON table1.primarykey = table2.foreignkey
At this point, its important to stress that the primarykey and foreignkey references can actually refer to any column, as long as the data types match, because you can create a temporary relationship between two non-key fields (although you probably won’t do so very often). This article uses the primarykey/foreignkey terms because they help emphasis the action that is going on behind the scenes. It is much more difficult to visualize the result of joining two non-key columns. However, technically, you aren’t restricted to the primary/foreign key fields, and you may need the capability some day. Just remember that a relationship based on non-key columns is temporary and exists only between the two tables involved.
A Few Guidelines
There are a few general guidelines that are common to almost all joins:
- The columns (fields) specified in fldlist can include all of the columns in or just a subset from the data source.
- A join can be built on a column not referenced in fldlist.
- Ambiguous columns must be fully qualified.
- The above syntax form is based on just two related tables even though SQL can handle multiple joins in the same statement.
- Although the Equal To (=) operator is the most common, you can use any of the following SQL comparison operators: =, >, <, >=, <=, and <>. (This isn’t true of all systems, be sure to check your system’s documentation for more specific information.)
|Figure 1: A simple inner join displays the category name for each product.|
Now, take a look at a quick example using Microsoft SQL Server’s sample database, Northwind. The following statement lists each product by name and category, as shown in Figure 1:
SELECT Products.ProductName, Categories.CategoryNameFROM Products JOIN Categories ON Products.CategoryID = Categories.CategoryID
(The figure doesn’t show all 77 records.) In the example above the joined columns have identical names, but that isn’t required. Just remember the data type must still be the same. Oracle introduced this syntax with 9i; it was not available in previous releases.
If you run the statement in Microsoft Access use INNER JOIN instead of JOIN as follows:
SELECT Products.ProductName, Categories.CategoryNameFROM Products INNER JOIN Categories ON Products.CategoryID = Categories.CategoryID
Jet SQL doesn’t support the lone JOIN keyword. Transact-SQL (T-SQL) and Oracle SQL both interpret the lone JOIN keyword as an inner join.
Occasionally, you may want to express a join condition in the WHERE clause (although you probably won’t have this need too often). Remember, when you do so, the join is implicit, but condition is expressed in the same manner using the same rules for the FROM syntax. Some systems only allow this syntax to be used with an inner join, which you’ll learn about in the next section. (There are exceptions. Oracle releases prior to 9i, supported only implicit joins in the WHERE clause. Consult your system’s documentation for the supported syntax.)
To express a join in the WHERE clause, use this form:
SELECT fldlist | *FROM table1, table2WHERE condition
where condition is a simple comparison of two columns. For instance, the previous FROM example could just as easily be written like this:
SELECT Products.ProductName, Categories.CategoryNameFROM Products, CategoriesWHERE Products.CategoryID = Categories.CategoryID
Both statements yield the same recordset, but that won’t always be the case. While it is true that both statements in this example return the same results, the two syntax forms aren’t interchangeable on a technical level.
Joining Three or More Tables
It was mentioned earlier that you can join multiple tables. The truth is that the complete statement can be fairly complex?accommodating several join clauses. The rules are the same, but the most important thing to remember is that each join applies to only one relationship, and each relationship represents an association between only two tables. Therefore, joins must build upon one another. When joining more than two tables, use the following syntax:
FROM table1 join table2 ON table1.primarykey = table2.foreignkey join table3 ON table2.primarykey = table3.foreignkey
The first join applies to table1 and table2, the second join applies to table2 and table3. You might hear table2 referred to as the intermediate or translation table because it’s the connection table for the other two tables. In a really complex statement, you might have several intermediate tables. Continue to add joins in this manner until your statement includes all the relationships and tables necessary to get just the right results. This is another fairly generic behavior that may or may not be strictly enforced by your system; be sure to check your system’s documentation.
The previous example used what may be a new term for you?inner join. SQL supports a number of join types: inner, left outer, right outer, full outer, and cross. Not all systems support all join types. Check your system’s documentation for specifics on which joins it supports.
An inner join returns only those records from both tables that have a matching value in the related field. This type of join is by far the most commonly used. In fact, many systems use this type as the default join. Think of your tables as intersecting circles, like those shown in Figure 2. Each circle represents a table within a relationship and the area where the circles intersect represents the records that have a matching primary and foreign key value. SQL retrieves or acts upon those records?and only those records?where the two circles intersect. (Again, remember that relationships aren’t confined to key fields, although most of the time, that is the arrangement you’ll be working with.)
|Figure 2: An inner join returns only those records where the primary and foreign key values match.|
The earlier Northwind example was an inner join?retrieving data from both tables only when the CategoryID values matched. In this particular example, each product record had a corresponding category value, but that won’t always be the case. Sometimes records are eliminated simply because they don’t have a matching primary/foreign key value. As you can see, a join works somewhat like a filter.
Oracle SQL enhances your flexibility with the release of Oracle 9i by supporting additional join syntax. Specifically, Oracle SQL (9i only) supports what it refers to as a natural join. This type of join automatically joins all columns in both tables with the same name?thus eliminating the need to include the join in a WHERE statement. The natural join may be used in both inner and outer joins using the following syntax:
SELECT fldlistFROM table1 NATURAL JOIN table2
Another Oracle SQL (9i only) addition is the USING clause. Rely on this new clause when tables have multiple columns with the same name and you want to limit the join to one particular column. You can modify the previous syntax by replacing the ON clause with a USING clause as follows:
SELECT fldlistFROM table1 JOIN table2USING (column)
When using this syntax, don’t qualify the column name in the fldlist or USING clause with the table name. Just remember that the ON and USING clauses aren’t interchangeable?the results will differ.
Now, take a look at an Oracle counterpart to the earlier Northwind inner join example. The following statement returns all the records where CategoryID in the Products table matches CategoryID in Categories:
SELECT Products.ProductName, Categories.CategoryNameFROM Products JOIN CategoriesUSING (CategoryID)
You’ll learn about the outer joins in the next section. (The examples written in Oracle are for syntax comparison only.)
The Outer Joins
An inner join’s ON condition retrieves only those records that satisfy the join condition. An outer join does the same thing but with the addition of returning records for one table in which there were no matching records in the other table.
There are three types of outer joins: left outer, right outer, and full outer. All outer joins retrieve records from both tables, just as an inner join does. However, an outer join retrieves all of the records from one of the tables. A column in the result is NULL if the corresponding input table did not contain a matching record.
|Figure 3: A left outer join returns all the records from the left table, or the one side of a relationship.|
The left outer join retrieves records from both tables, retrieving all the records from the left table and any records from the right table where the condition values match. If there are no matching values in from the right table, the join still retrieves all the records from the left table. Any columns from the right table that are unmatchedare left NULL. Consequently, the resulting recordset often appears to have incomplete records.
The circles in Figure 3 illustrate the concept of a left outer join. The left circle is the left table and all the records in this circle are returned. The intersection represents the matching primary/foreign key values between the two tables.
The following statement illustrates the usefulness of a left outer join:
SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderIDFROM Customers LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerIDORDER BY Customers.CustomerID
|Figure 4: A simple left outer join helps you quickly spot inactive customers.|
In Figure 4, the join returns all the customer records from the Customers table, and relates each customer to every order number in the Orders table. By adding the ORDER BY clause, you can easily spot those customers that haven’t ordered anything. (The same statement works in the Access .mdb version of Northwind.)
A similar statement in Oracle 9i resembles the following:
SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderIDFROM Customers LEFT OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID ORDER BY Customers.CustomerID
Earlier versions use the following:
SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderIDFROM Customers, OrdersWHERE Customers.CustomerID = Orders.CustomerID(+)ORDER BY Customers.CustomerID
|Figure 5: A right outer join returns all the records from the right table, or the many side of a relationship.|
The (+) symbol denotes the table (side) that may have no matching rows to the other table (side). Think of this as everything from the left plus matching values from the right. Notice the change between the two versions. Earlier versions are similar to the generic syntax discussed in the “SQL Standards” at the beginning of this article.
The right outer join is similar to the left outer join in that it retrieves all the records from one side of the relationship, but this time it’s the right table. Only records where the condition values match are retrieved from the left. The circles in Figure 5 illustrate this join; notice that the only difference between this set of circles and the last is that the shaded circle is on the right?that right circle represents the right or many table in a relationship.
The Outer Joins (cont’d)
The next example relates all the customers to the current orders and by doing so exposes customers that haven’t ordered. You can run the same statement using a right outer join to exclude the two Null valued records, but an inner join will do the same. If the Orders table contains any orphans (records without a matching value), a right outer will find them. The Orders table doesn’t contain orphans (which is good for the database, but bad for these examples), so just suppose it does. If you delete the first record in the Customers table?Alfreds Futterkiste, the following right outer join turns up a few orphans:
SELECT CustomersCOPY.CustomerID, CustomersCOPY.CompanyName, Orders.OrderIDFROM CustomersCOPY RIGHT JOIN Orders ON CustomersCOPY.CustomerID = Orders.CustomerIDORDER BY CustomersCOPY.CustomerID
The orphans sort to the top of the results, as shown in Figure 6. Included here is a copy of the Customers table so as not to alter the original content and design of Northwind. If you want to work through the example yourself, work with a copy of the Customers table as has been done here.
|Figure 6: A right outer join turned up a few orphans that we created on purpose, just for this example.|
A review of the complete recordset reveals that the right outer eliminates the two Null-valued records for PARIS and FISSA from the earlier example. These were the two inactive customers. Remember, the right outer retrieves all the records from the right table, which in this case is the Orders table. There are no orders for either of the above companies in the Orders table, and the join can’t show data that doesn’t exist. However, the orphans do have orders in the Orders table, even though the join can’t match them to a customer in the CustomersCOPY table. The right outer join includes the orphans despite the relationship’s inability to match each orphan’s foreign key value to a primary key value in CustomersCOPY.
A similar Oracle 9i statement resembles the following:
SELECT CustomersCOPY.CustomerID, CustomersCOPY.CompanyName, Orders.OrderIDFROM CustomersCOPY RIGHT OUTER JOIN OrdersORDER BY CustomersCOPY.CustomerID
Earlier versions use the (+) syntax in the form:
SELECT CustomersCOPY.CustomerID, CustomersCOPY.CompanyName, Orders.OrderIDFROM CustomersCOPY, OrdersWHERE CustomersCOPY.CustomerID(+) = Orders.CustomerIDORDER BY CustomersCOPY.CustomerID
Experimenting with the outer joins may lull you into believing that they’re interchangeable, but they aren’t. Remember, the left join retrieves all the records from the table on the left, the right join retrieves all the records from the table on the right. Keeping that in perspective should help you decide which join you need.
|Figure 7: A full outer join combines all records from both tables.|
Now that you know the nature of the left outer and right outer joins, you might be able to anticipate what a full outer join does. The full outer join retrieves all records from both the left and the right table. Most likely, you won’t run into a need for this join very often. A full outer join is best illustrated by slightly changing the previous example. Simply run that same statement as a full outer join, as shown in Figure 7. Review the entire recordset to see that the results now include both the orphans and the two inactive customers. The orphans are sorted to the top as they were in Figure 6. The first inactive customer is FISSA?notice that the OrderID value is Null. Scroll down to find PARIS, the other inactive customer.
A comparable Oracle 9i statement resembles the following:
SELECT CustomersCOPY.CustomerID, CustomersCOPY.CompanyName, Orders.OrderIDFROM CustomersCOPY FULL OUTER JOIN Orders ON CustomersCOPY.CustomerID = Orders.CustomerIdORDER BY CustomersCOPY.CustomerID
Or, rely on the new USING clause in the form:
SELECT CustomersCOPY.CustomerID, CustomersCOPY.CompanyName, Orders.OrderIDFROM CustomersCOPY FULL OUTER JOIN OrdersUSING (CustomersCOPY.CustomerID)ORDER BY CustomersCOPY.CustomerID
There isn’t an equivalent in earlier versions of Oracle. Earlier versions use WHERE statement with the plus sign (+) syntax, and one of the rules is that you can’t use the symbol on both sides.
A cross join returns what’s known as a Cartesian product. This means that the join combines every row from the left table with every row in the right table. As you can imagine, sometimes this join produces a mess, but under the right circumstances, it can be very useful. For instance, suppose you want to combine each product with a pricing table so you can analyze each product at each price. In one table, you have the current price of each product. In a second table, you have a list of increase values such as .05 percent, .06 percent, .10 percent, and so on. In this case, a cross join combines each product price with each increase value. The following statement creates such a matrix between the Products table and a temporary table named Amount:
SELECT Products.ProductName, Products.UnitPrice, Temporary.AmountFROM Products CROSS JOIN TemporaryORDER BY Products.ProductName, Temporary.Amount
|Figure 8: Combining all the rows can produce a matrix affect known as a Cartesian product.|
The above syntax works in SQL Server and Oracle 9i.
The resulting recordset, shown in Figure 8, creates a record for each product with each increase amount. The results are known as a Cartesian product, which combines every record in the left table with every record in the right table.
Jet and Oracle versions prior to 9i don’t support an explicit cross join, but you can get the same results using the following statement:
SELECT Products.ProductName, Products.UnitPrice, Temporary.AmountFROM Products, TemporaryORDER BY Products.ProductName, Temporary.Amount
In truth, you are joining two records where no relationship is in use.
A Note About Self Joins
There’s a special type of relationship that’s called a self join. It’s not really a type of join, since you can apply it to any join type. A self join is rather unique in that it involves a relationship with only one table. Instead of relating a table to a second table, the join is based on a relationship based on the same table. You simply reference the table twice and use an alias to avoid confusion. The common example used here is when you have a hierarchal reporting structure whereby one member of staff reports to another. In this case the employees in the employee table are related to one another in order to find out who manages whom. For example, this returns all records including those employees without a manager:
SELECT Employees.LastName, Employees.FirstName, Supervisors.LastName, Supervisors.FirstNameFROM Employees LEFT JOIN Employees AS Supervisors ON Employees.EmployeeID = Supervisors.ReportsTo
Here, Supervisors is an alias to the Employees table. The result lists each employee and the employees that report to each employee listed in the Employees table.
Oracle 9i syntax resembles the following:
SELECT Employees.LastName, Employees, FirstName, Supervisors.LastName, Supervisors.FirstNameFROM Employees LEFT OUTER JOIN Employees Supervisors ON (Employees.EmployeeID = Supervisors.ReportsTo)
If you are using an earlier version, use syntax similar to the following:
SELECT Employees.LastName, Employees.FirstName, Supervisors.LastName, Supervisors.FirstNameFROM Employees, Employees SupervisorsWHERE Employees.EmployeeID = Supervisors.ReportsTo(+)
Not Every System Is Alike
It would be impossible to review how each relational database system uses each of the SQL joins. This article has provided the syntax for Jet SQL, Transact-SQL, and Oracle SQL. Table A identifies which systems support each of the joins discussed.
Table A: System Supported Joins