Home » Compress Binary Messages Using SQL

Compress Binary Messages Using SQL

inary data is natural for computers. Everything running on the computer’s physical layer, in the registers, ALU, memory, and so on, is just a result of manipulating a set of 0’s and 1’s, where 0 and 1 are the logical representations of two different voltage levels. While very convenient for machine processing, base-two numeral systems have some drawbacks. For one, each binary digit can keep only two values: 0 or 1. That means large binary numbers will produce very long binary messages, which cause problems with data transfer, processing, and storage.

One solution to the problem of long binary messages is data encoding that shortens the original message?in another words, data compression. You can employ any number of compression algorithms to compress data, but what happens under the hood? Rather than discuss the fairly large number of compression algorithms out there, this article focuses on the technical aspects of data compression, namely binary data manipulations that can be useful for any compression algorithm.

Author’s Note: This article developed as a SQL/Transact-SQL solution to a problem called “Bit Compressor” from the ACM International Collegiate Programming Contest (ACM-ICPC) 2006 World Finals (click here to view the problem). While the solution technique(s) can serve as an approach to completing practical tasks (generating and manipulating binary numbers, encrypting data using patterns, and detecting and fixing data transmission errors), the problem primarily serves as a brainteaser.

How to Generate Binary Numbers in SQL

While binary data is native for computers, using binary data directly in the programming layers?above assemblers and C?is very inconvenient because the human eye and brain are much more familiar with decimal numbers. In addition, binary numbers can occupy a lot of storage space when represented in long character strings. This may be why SQL and SQL Server don’t have programming interfaces that allow direct manipulation of binary data. If you want to generate binary numbers with either technology, you need to find your own way to do it.

One approach for producing binary numbers is generating decimal numbers (which is very easy to do in SQL Server or SQL) and then converting them to their binary equivalents. To convert data from an n-base (decimal) to an m-base (binary) numeral system, you can use the following well-known algorithm:

Take the number in the n-base system and repeatedly divide it by the radix of the m-base numeral system.
Taken in order from last to first, the remainders of your divisions will give you the converted number.

For example, if you want to convert 87₁₀ to its binary equivalent, you need to take 87 and repeatedly divide it by 2 as follows:

87/2  = 43, remainder 143/2  = 21, remainder 121/2  = 10, remainder 110/2  = 5,  remainder 05/2   = 2,  remainder 12/2   = 1   remainder 0

When you attach the remainders of all the intermediate results to the result of the last division, you get 1010111 = 2⁶ + 2⁴ + 2² + 2¹ + 2⁰ = 64 + 16 + 4 + 2 + 1 = 87₁₀. This approach works, but it may become very inefficient if you need to generate lots of binary numbers.

If you have hexadecimal numbers, you can use another simple conversion approach: just replace each hexadecimal digit by its binary equivalent. For example, you can convert E70₁₆ = 14 * 16² + 7 * 16¹ = 3696₁₀ to 111001110000₂ = 2¹¹ + 2¹⁰ + 2⁹ + 2⁶ + 2⁵ + 2⁴ = 3696₁₀.

Listing 1 shows a more interesting method for generating binary numbers with SQL cross joins.

Listing 1. How to Generate Binary Numbers Using Cross JoinsSET NOCOUNT ONIF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.bin')               AND type in (N'U'))DROP TABLE dbo.binCREATE TABLE bin(colBin varchar(50));INSERT INTO bin VALUES('0');INSERT INTO bin VALUES('1');SELECT (b1.colBin + b2.colBin) as binNumber   FROM bin b1 CROSS JOIN bin b2   ORDER BY 1;Result:binNumber----------------------------------------00011011

Incrementing the number of cross joins by one and adding one more addend to the select list in the query from Listing 1, you will get eight binary numbers (see Listing 2).

Listing 2. How to Generate More Binary Numbers Using Cross Joins

SELECT (b1.colBin + b2.colBin + b3.colBin) as binNumber FROM bin b1 CROSS JOIN bin b2 CROSS JOIN bin b3 ORDER BY 1;Result:binNumber------------------------------------------------------------000001010011100101110111

The number of rows in the result (the amount of generated binary numbers) is 2 to the power of the number of tables participating in the cross joins (in this case, 2³ = 8). If you continue incrementing the number of tables participating in cross joins together with the number of addends in the select list of the query, you theoretically can get any amount of binary numbers.

The problem with this approach is the growing query. If, for example, you need to generate all possible binary numbers in 20 digits, you need to produce the query with 20 cross joins and 20 addends in the select list. The script in Listing 3 helps solve the problem of a growing query.

I ran the script for the @binLen = 20 and generated all possible binary numbers in 20 digits: 2²⁰ ? 1 = 1,048,575 numbers. To make it more convenient, I loaded the result into the table binFinal. That table stores all binary numbers together with their decimal equivalents.

Data Compression

To apply what you’ve learned about binary numbers to data compression, follow the example in this section, which compresses binary numbers by replacing each maximal sequence of n 1’s with the binary number n. The example does this only when the replacement makes the original message shorter. For example, for the original message 10111111111111111000111011 (26 bits), you will get the compressed message 10111100011011 (14 bits). (You actually can apply the same algorithm to a sequence of 0’s.)

Replacing a maximal sequence of n 1’s with the binary number n means you can’t replace just a subset of 1’s from the sequence of 1’s. For example, you can’t take five 1’s from the 15 in the above example. You need to replace an entire piece, which will be flanked by 0’s on both sides (as is the case when 1’s are in the middle of a binary sequence) or bounded by a zero on one side (as is the case when 1’s are at the very beginning or at the very end of a binary sequence).

Because this example mandates that replacement has to shorten the message, there is a lower limit to the sequences of 1’s that you can replace. That limit is equal to three 1’s. Indeed, 111 can be replaced by 11, which shortens the length of the binary message by one digit. Conversely, 11 can be replaced by 10, which will not shorten the length of the message.

Even though the smallest unit of information is one bit, processing data bit by bit is very inconvenient. Usually data is stored, transferred, or processed by bytes or by words that consist of 1, 2, 4, or 8 bytes. For my tests, I used 20-bit binary numbers generated in the previous example. You should align byte operations on those numbers to the nearest byte boundary (24 bits). However, for the purposes of data compression, the data doesn’t need to be aligned to the byte boundary. It doesn’t even need to keep the leading zeros, which I’ve deleted to make the remaining transformations lighter (see Listing 4).

Listing 4. Deleting Leading ZerosSET NOCOUNT ON;DELETE binFinal WHERE decNum = 0;UPDATE binFinal	SET binNum = RIGHT(binNum, LEN(binNum) - PATINDEX('%[^0]%',binNum) + 1);SELECT * FROM binFinal; Result:decNum      binNum----------- ----------------------------1           12           103           114           1005           1016           1107           1118           1000. . . . . . . . . . . . . . . . . . . .11219       1010111101001111220       10101111010100. . . . . . . . . . . . . . . . . . . .1048573     111111111111111111011048574     111111111111111111101048575     11111111111111111111

Now it’s time to try compressing the generated binary numbers in accordance with the example rules. First of all, create and load a new table similar to binFinal but with one more column that will store compressed data (see Listing 5).

Listing 5. Create and Prepare Table for Data CompressionSET NOCOUNT ONIF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.binWithCompr')               AND type in (N'U'))DROP TABLE dbo.binWithCompr;CREATE TABLE dbo.binWithCompr(		decNum int PRIMARY KEY, 		originSeq varchar(50), 		compresSeq varchar(50)) INSERT INTO binWithComprSELECT decNum, binNum, binNum FROM binFinal;SELECT * FROM dbo.binWithCompr;Result:decNum      originSeq               compresSeq----------- ----------------------- --------------1           1                       12           10                      103           11                      114           100                     100. . . . . . . . . . . . . . . . . .

Next, generate all the patterns of 1’s presented in the binary sequences of the binFinal table (see Listing 6). The patterns can vary from 111 to 111111111111111111111.

Listing 6. Generate Patterns of 1'sDECLARE @maxLen int, @i int;  SELECT @maxLen = LEN(originSeq) FROM binWithCompr   WHERE decNum = (SELECT MAX(decNum) FROM binWithCompr)IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.patterns')               AND type in (N'U'))DROP TABLE dbo.patternsCREATE TABLE dbo.patterns(c1 varchar(100) PRIMARY KEY, decLength int);SET @i = 3;WHILE(@i <= @maxLen)BEGIN   INSERT INTO dbo.patterns VALUES(REPLICATE('1', @i), @i);   SELECT @i = @i + 1;	ENDSELECT * FROM dbo.patterns;Result:c1                                 decLength--------------------------------   ------------111                                31111                               411111                              5111111                             61111111                            711111111                           8. . . . . . . . . . . . . . . . . . 1111111111111111111                1911111111111111111111               20

Next, replace each maximal pattern of 1's you found with its binary equivalent. The entire compression requires two steps:

Replace all maximal sequences of n 1's with the decimal version of n, surrounded by '*'.
Replace the decimal numbers with their binary equivalents and remove '*'.

Listing 7 describes the first compression step. Listing 8 is the final compression step.

Message Decompression

When you have a table with original and compressed values, you can easily decompress data. All you need to do is query the table and find the original message corresponding to what you just received. Listing 9 is an example of message decompression.

Listing 9. Message DecompressionDECLARE @compresSeq VARCHAR(20);SELECT @compresSeq = '10001011';SELECT * FROM binWithCompr    WHERE compresSeq = @compresSeq;Result:decNum      originSeq                         compresSeq----------- --------------------------------- ---------------139         10001011                          10001011279         100010111                         10001011491         111101011                         10001011983         1111010111                        1000101118431       100011111111111                   1000101163487       1111011111111111                  100010111048571     11111111111111111011              10001011

Oops, there's a problem. Seven different original messages can be compressed to the same binary number: 10001011. That means you need to get more information in order to find the correct answer. If you knew the number of 1's in the original message, it would definitely help. Listing 10 shows how you can find the number of 1's and/or 0's in all binary numbers in the table using only one query.

Listing 10. How to Find Number of 1's and/or 0's in Binary MessagesDECLARE @compresSeq VARCHAR(20);SELECT @compresSeq = '10001011';SELECT COUNT(*) AS num_of_1s,        SUBSTRING(t2.originSeq, t1.decNum,1) AS binVvalue,        t2.originSeq,       t2.compresSeq	FROM (SELECT decNum FROM binWithCompr             WHERE decNum <=            (SELECT MAX(LEN(originSeq)) FROM binWithCompr               WHERE compresSeq = @compresSeq)) t1    CROSS JOIN    (SELECT originSeq, compresSeq FROM binWithCompr        WHERE compresSeq = @compresSeq) t2   WHERE SUBSTRING(t2.originSeq, t1.decNum,1) = '1'   GROUP BY  SUBSTRING(t2.originSeq, t1.decNum,1), t2.originSeq, t2.compresSeq;Result:num_of_1s   binVvalue 	originSeq 		compresSeq----------- -------------- ------------------- --------------------7           1         111101011                1000101112          1         100011111111111          1000101119          1         11111111111111111011     100010118           1         1111010111               100010114           1         10001011                 100010115           1         100010111                1000101115          1         1111011111111111         10001011

As you can see, the information about the number of 1's in the original message received with the compressed message can help you uniquely identify the original message.But this is not always correct. In some cases, different original messages with the same number of 1's can be converted into the same compressed message. For example, if you run the script from Listing 10 for @compresSeq = '1010101', you will get the following result:

num_of_1s   binValue 	originSeq  	compresSeq----------- --------- 	------------ -------------7           1         	111110101    101010110          1         	11111011111  10101017           1         	101011111    10101017           1         	101111101    10101014           1         	1010101      1010101

As you can see, three original messages have seven 1's and two 0's, which can be compressed into the message 1010101. Therefore, there is no way to uniquely identify the original message (unless you have some additional information about it). All that means is you should use this type of compression only in situations where some data loss is acceptable (lossy data compression/decompression such as in image processing and transmission).

Message Size Matters

The example uses a hash table I created to make the decompression process easier. For obvious reasons, the described approach can be applied only to relatively short (2-4 bytes) binary messages. The number of binary combinations in 32 bits is 2³² = 4,294,967,296. Generating that amount of binary numbers is difficult but doable. Generating all binary combinations for messages that are hundreds or thousands of bits in length, however, is impossible.

Nevertheless, you can use the techniques shown in this article in many data compression implementations and/or when you need to manipulate binary data.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.