Finding Two Equal Sets of Data

Finding Two Equal Sets of Data

[Joe Celko’s Equal Sets Puzzle]

Set theory has two symbols for subsets. One is a “horseshoe” on its side, which means that set A is contained within set B and is sometimes called a proper subset. The other is the same symbol with a horizontal bar under it, which means “contained in or equal to,” which is sometimes called a subset or a containment operator.

Standard SQL has never had an operator to compare tables against each other. Several college text books on Relational Databases mention a CONTAINS predicate which does not exist in standard SQL-89 nor was it proposed for SQL-92. Two such offenders are An Introduction to Data Base Systems by Bipin C. Desai (West Publishing 1990, ISBN 0-314-66771-7) and Fundamentals of Database Systems by Elmasri and Navthe (Benjamin Cummings 1989, ISBN 0-8053-0145-3). This predicate used to exist in the original System R, IBM’s first experimental SQL system, but it was dropped from later SQL implementations because of the expense of running it.

The IN predicate is a test for membership, not for subsets. For those of you who remember your high school set theory, membership is shown with a stylized epsilon with the containing set on the right side. Membership is for one element, while a subset is itself a set, not just an element.

Chris Date’s puzzle in the December issue of Database Programming & Design magazine (“A Matter of Integrity, Part II” According to Date, 1993 December) was to use a supplier and parts table to find pairs of suppliers who provide EXACTLY the same parts. This is the same thing as finding two equal sets.

    CREATE TABLE SupParts        (sno CHAR(2) NOT NULL,        pno CHAR(2) NOT NULL,        PRIMARY KEY (sno, pno));
The usual way of proving that two sets are equal to each other is to show that set A contains set B, and set B contains set A. What you would usually do in standard SQL would be to show that there exists no element in set A that is not in set B and therefore A is a subset of B.

Instead of using subsets, I thought I would look for another way to do set equality. First I join one supplier to another on their common parts, eliminating the situation where supplier one is the same as supplier two, so that I have the intersection of the two sets. If the intersection has the same number of pairs as each of the two sets has elements, then the two sets are equal.

    SELECT SP1.sno AS Supplier1, SP2.sno AS Supplier2        FROM SupParts AS SP1 JOIN SupParts AS SP2            ON (SP1.pno = SP2.pno AND SP1.sno < SP2.sno)        GROUP BY Supplier1, Supplier2        HAVING (SELECT COUNT(*)            FROM SupParts AS SP3            WHERE SP3.sno = SP1.sno)                = (SELECT COUNT(*)                FROM SupParts AS SP4                WHERE SP4.sno = SP2.sno);
This was tested in Watcom SQL and uses some SQL-92 features. If there is an index on the supplier number in the SupParts table, it can provide the counts directly as well as help with the join operation.

It is left to the reader as an exercise to modify the query to do a proper subset operation. Hint: first remember to change the “less than” to “not equal”, then look at the WHERE clause.

Puzzle provided courtesy of:
Joe Celko
[email protected]

Share the Post:
XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing

data dictionary

Tools You Need to Make a Data Dictionary

Data dictionaries are crucial for organizations of all sizes that deal with large amounts of data. they are centralized repositories of all the data in organizations, including metadata such as