Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Compress Binary Messages Using SQL

Data compression is a great method for maximizing data storage space and making data communication faster. However, compression and decompression of binary data sometimes can be quite tricky. Learn a few useful data-compression techniques.


advertisement
inary data is natural for computers. Everything running on the computer's physical layer, in the registers, ALU, memory, and so on, is just a result of manipulating a set of 0's and 1's, where 0 and 1 are the logical representations of two different voltage levels. While very convenient for machine processing, base-two numeral systems have some drawbacks. For one, each binary digit can keep only two values: 0 or 1. That means large binary numbers will produce very long binary messages, which cause problems with data transfer, processing, and storage.

One solution to the problem of long binary messages is data encoding that shortens the original message—in another words, data compression. You can employ any number of compression algorithms to compress data, but what happens under the hood? Rather than discuss the fairly large number of compression algorithms out there, this article focuses on the technical aspects of data compression, namely binary data manipulations that can be useful for any compression algorithm.

Author's Note: This article developed as a SQL/Transact-SQL solution to a problem called "Bit Compressor" from the ACM International Collegiate Programming Contest (ACM-ICPC) 2006 World Finals (click here to view the problem). While the solution technique(s) can serve as an approach to completing practical tasks (generating and manipulating binary numbers, encrypting data using patterns, and detecting and fixing data transmission errors), the problem primarily serves as a brainteaser.

How to Generate Binary Numbers in SQL

While binary data is native for computers, using binary data directly in the programming layers—above assemblers and C—is very inconvenient because the human eye and brain are much more familiar with decimal numbers. In addition, binary numbers can occupy a lot of storage space when represented in long character strings. This may be why SQL and SQL Server don't have programming interfaces that allow direct manipulation of binary data. If you want to generate binary numbers with either technology, you need to find your own way to do it.



One approach for producing binary numbers is generating decimal numbers (which is very easy to do in SQL Server or SQL) and then converting them to their binary equivalents. To convert data from an n-base (decimal) to an m-base (binary) numeral system, you can use the following well-known algorithm:

  1. Take the number in the n-base system and repeatedly divide it by the radix of the m-base numeral system.
  2. Taken in order from last to first, the remainders of your divisions will give you the converted number.

For example, if you want to convert 8710 to its binary equivalent, you need to take 87 and repeatedly divide it by 2 as follows:

87/2 = 43, remainder 1 43/2 = 21, remainder 1 21/2 = 10, remainder 1 10/2 = 5, remainder 0 5/2 = 2, remainder 1 2/2 = 1 remainder 0

When you attach the remainders of all the intermediate results to the result of the last division, you get 1010111 = 26 + 24 + 22 + 21 + 20 = 64 + 16 + 4 + 2 + 1 = 8710. This approach works, but it may become very inefficient if you need to generate lots of binary numbers.

If you have hexadecimal numbers, you can use another simple conversion approach: just replace each hexadecimal digit by its binary equivalent. For example, you can convert E7016 = 14 * 162 + 7 * 161 = 369610 to 1110011100002 = 211 + 210 + 29 + 26 + 25 + 24 = 369610.

Listing 1 shows a more interesting method for generating binary numbers with SQL cross joins.

Listing 1. How to Generate Binary Numbers Using Cross Joins

SET NOCOUNT ON IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.bin') AND type in (N'U')) DROP TABLE dbo.bin CREATE TABLE bin(colBin varchar(50)); INSERT INTO bin VALUES('0'); INSERT INTO bin VALUES('1'); SELECT (b1.colBin + b2.colBin) as binNumber FROM bin b1 CROSS JOIN bin b2 ORDER BY 1; Result: binNumber ---------------------------------------- 00 01 10 11

Incrementing the number of cross joins by one and adding one more addend to the select list in the query from Listing 1, you will get eight binary numbers (see Listing 2).

Listing 2. How to Generate More Binary Numbers Using Cross Joins

SELECT (b1.colBin + b2.colBin + b3.colBin) as binNumber FROM bin b1 CROSS JOIN bin b2 CROSS JOIN bin b3 ORDER BY 1; Result: binNumber ------------------------------------------------------------ 000 001 010 011 100 101 110 111

The number of rows in the result (the amount of generated binary numbers) is 2 to the power of the number of tables participating in the cross joins (in this case, 23 = 8). If you continue incrementing the number of tables participating in cross joins together with the number of addends in the select list of the query, you theoretically can get any amount of binary numbers.

The problem with this approach is the growing query. If, for example, you need to generate all possible binary numbers in 20 digits, you need to produce the query with 20 cross joins and 20 addends in the select list. The script in Listing 3 helps solve the problem of a growing query.

I ran the script for the @binLen = 20 and generated all possible binary numbers in 20 digits: 220 – 1 = 1,048,575 numbers. To make it more convenient, I loaded the result into the table binFinal. That table stores all binary numbers together with their decimal equivalents.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap