By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.

Data compression is a great method for maximizing data storage space and making data communication faster. However, compression and decompression of binary data sometimes can be quite tricky. Learn a few useful data-compression techniques.

inary data is natural for computers. Everything running on the computer's physical layer, in the registers, ALU, memory, and so on, is just a result of manipulating a set of 0's and 1's, where 0 and 1 are the logical representations of two different voltage levels. While very convenient for machine processing, base-two numeral systems have some drawbacks. For one, each binary digit can keep only two values: 0 or 1. That means large binary numbers will produce very long binary messages, which cause problems with data transfer, processing, and storage.

One solution to the problem of long binary messages is data encoding that shortens the original message—in another words, data compression. You can employ any number of compression algorithms to compress data, but what happens under the hood? Rather than discuss the fairly large number of compression algorithms out there, this article focuses on the technical aspects of data compression, namely binary data manipulations that can be useful for any compression algorithm.

Author's Note: This article developed as a SQL/Transact-SQL solution to a problem called "Bit Compressor" from the ACM International Collegiate Programming Contest (ACM-ICPC) 2006 World Finals (click here to view the problem). While the solution technique(s) can serve as an approach to completing practical tasks (generating and manipulating binary numbers, encrypting data using patterns, and detecting and fixing data transmission errors), the problem primarily serves as a brainteaser.

How to Generate Binary Numbers in SQL

While binary data is native for computers, using binary data directly in the programming layers—above assemblers and C—is very inconvenient because the human eye and brain are much more familiar with decimal numbers. In addition, binary numbers can occupy a lot of storage space when represented in long character strings. This may be why SQL and SQL Server don't have programming interfaces that allow direct manipulation of binary data. If you want to generate binary numbers with either technology, you need to find your own way to do it.

One approach for producing binary numbers is generating decimal numbers (which is very easy to do in SQL Server or SQL) and then converting them to their binary equivalents. To convert data from an n-base (decimal) to an m-base (binary) numeral system, you can use the following well-known algorithm:

Take the number in the n-base system and repeatedly divide it by the radix of the m-base numeral system.

Taken in order from last to first, the remainders of your divisions will give you the converted number.

For example, if you want to convert 87_{10} to its binary equivalent, you need to take 87 and repeatedly divide it by 2 as follows:

When you attach the remainders of all the intermediate results to the result of the last division, you get 1010111 = 2^{6} + 2^{4} + 2^{2} + 2^{1} + 2^{0} = 64 + 16 + 4 + 2 + 1 = 87_{10}. This approach works, but it may become very inefficient if you need to generate lots of binary numbers.

If you have hexadecimal numbers, you can use another simple conversion approach: just replace each hexadecimal digit by its binary equivalent. For example, you can convert E70_{16} = 14 * 16^{2} + 7 * 16^{1} = 3696_{10} to 111001110000_{2} = 2^{11} + 2^{10} + 2^{9} + 2^{6} + 2^{5} + 2^{4} = 3696_{10}.

Listing 1 shows a more interesting method for generating binary numbers with SQL cross joins.

Listing 1. How to Generate Binary Numbers Using Cross Joins

SET NOCOUNT ON
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.bin')
AND type in (N'U'))
DROP TABLE dbo.bin
CREATE TABLE bin(colBin varchar(50));
INSERT INTO bin VALUES('0');
INSERT INTO bin VALUES('1');
SELECT (b1.colBin + b2.colBin) as binNumber
FROM bin b1 CROSS JOIN bin b2
ORDER BY 1;
Result:
binNumber
----------------------------------------
00
01
10
11

Incrementing the number of cross joins by one and adding one more addend to the select list in the query from Listing 1, you will get eight binary numbers (see Listing 2).

Listing 2. How to Generate More Binary Numbers Using Cross Joins

SELECT (b1.colBin + b2.colBin + b3.colBin) as binNumber
FROM bin b1 CROSS JOIN bin b2
CROSS JOIN bin b3
ORDER BY 1;
Result:
binNumber
------------------------------------------------------------
000
001
010
011
100
101
110
111

The number of rows in the result (the amount of generated binary numbers) is 2 to the power of the number of tables participating in the cross joins (in this case, 2^{3} = 8). If you continue incrementing the number of tables participating in cross joins together with the number of addends in the select list of the query, you theoretically can get any amount of binary numbers.

The problem with this approach is the growing query. If, for example, you need to generate all possible binary numbers in 20 digits, you need to produce the query with 20 cross joins and 20 addends in the select list. The script in Listing 3 helps solve the problem of a growing query.

I ran the script for the @binLen = 20 and generated all possible binary numbers in 20 digits: 2^{20} – 1 = 1,048,575 numbers. To make it more convenient, I loaded the result into the table binFinal. That table stores all binary numbers together with their decimal equivalents.