Finding Two Equal Sets of Data

Finding Two Equal Sets of Data

[Joe Celko’s Equal Sets Puzzle]

Set theory has two symbols for subsets. One is a “horseshoe” on its side, which means that set A is contained within set B and is sometimes called a proper subset. The other is the same symbol with a horizontal bar under it, which means “contained in or equal to,” which is sometimes called a subset or a containment operator.

Standard SQL has never had an operator to compare tables against each other. Several college text books on Relational Databases mention a CONTAINS predicate which does not exist in standard SQL-89 nor was it proposed for SQL-92. Two such offenders are An Introduction to Data Base Systems by Bipin C. Desai (West Publishing 1990, ISBN 0-314-66771-7) and Fundamentals of Database Systems by Elmasri and Navthe (Benjamin Cummings 1989, ISBN 0-8053-0145-3). This predicate used to exist in the original System R, IBM’s first experimental SQL system, but it was dropped from later SQL implementations because of the expense of running it.

The IN predicate is a test for membership, not for subsets. For those of you who remember your high school set theory, membership is shown with a stylized epsilon with the containing set on the right side. Membership is for one element, while a subset is itself a set, not just an element.

Chris Date’s puzzle in the December issue of Database Programming & Design magazine (“A Matter of Integrity, Part II” According to Date, 1993 December) was to use a supplier and parts table to find pairs of suppliers who provide EXACTLY the same parts. This is the same thing as finding two equal sets.

    CREATE TABLE SupParts        (sno CHAR(2) NOT NULL,        pno CHAR(2) NOT NULL,        PRIMARY KEY (sno, pno));
The usual way of proving that two sets are equal to each other is to show that set A contains set B, and set B contains set A. What you would usually do in standard SQL would be to show that there exists no element in set A that is not in set B and therefore A is a subset of B.

See also  5 Tips For Working With An Onsite Interpreter

Instead of using subsets, I thought I would look for another way to do set equality. First I join one supplier to another on their common parts, eliminating the situation where supplier one is the same as supplier two, so that I have the intersection of the two sets. If the intersection has the same number of pairs as each of the two sets has elements, then the two sets are equal.

    SELECT SP1.sno AS Supplier1, SP2.sno AS Supplier2        FROM SupParts AS SP1 JOIN SupParts AS SP2            ON (SP1.pno = SP2.pno AND SP1.sno < SP2.sno)        GROUP BY Supplier1, Supplier2        HAVING (SELECT COUNT(*)            FROM SupParts AS SP3            WHERE SP3.sno = SP1.sno)                = (SELECT COUNT(*)                FROM SupParts AS SP4                WHERE SP4.sno = SP2.sno);
This was tested in Watcom SQL and uses some SQL-92 features. If there is an index on the supplier number in the SupParts table, it can provide the counts directly as well as help with the join operation.

It is left to the reader as an exercise to modify the query to do a proper subset operation. Hint: first remember to change the “less than” to “not equal”, then look at the WHERE clause.

Puzzle provided courtesy of:
Joe Celko
[email protected]


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist