TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
 Specialized Dev Zones Research Center eBook Library .NET Java C++ Web Dev Architecture Database Security Open Source Enterprise Mobile Special Reports 10-Minute Solutions DevXtra Blogs Slideshow

# Compress Binary Messages Using SQL : Page 2

## Data compression is a great method for maximizing data storage space and making data communication faster. However, compression and decompression of binary data sometimes can be quite tricky. Learn a few useful data-compression techniques.

 by Alex Kozak
 Sep 30, 2006
 Page 2 of 3

#### Data Compression

To apply what you've learned about binary numbers to data compression, follow the example in this section, which compresses binary numbers by replacing each maximal sequence of n 1's with the binary number n. The example does this only when the replacement makes the original message shorter. For example, for the original message 10111111111111111000111011 (26 bits), you will get the compressed message 10111100011011 (14 bits). (You actually can apply the same algorithm to a sequence of 0's.)

Replacing a maximal sequence of n 1's with the binary number n means you can't replace just a subset of 1's from the sequence of 1's. For example, you can't take five 1's from the 15 in the above example. You need to replace an entire piece, which will be flanked by 0's on both sides (as is the case when 1's are in the middle of a binary sequence) or bounded by a zero on one side (as is the case when 1's are at the very beginning or at the very end of a binary sequence).

Because this example mandates that replacement has to shorten the message, there is a lower limit to the sequences of 1's that you can replace. That limit is equal to three 1's. Indeed, 111 can be replaced by 11, which shortens the length of the binary message by one digit. Conversely, 11 can be replaced by 10, which will not shorten the length of the message.

Even though the smallest unit of information is one bit, processing data bit by bit is very inconvenient. Usually data is stored, transferred, or processed by bytes or by words that consist of 1, 2, 4, or 8 bytes. For my tests, I used 20-bit binary numbers generated in the previous example. You should align byte operations on those numbers to the nearest byte boundary (24 bits). However, for the purposes of data compression, the data doesn't need to be aligned to the byte boundary. It doesn't even need to keep the leading zeros, which I've deleted to make the remaining transformations lighter (see Listing 4).

SET NOCOUNT ON; DELETE binFinal WHERE decNum = 0; UPDATE binFinal SET binNum = RIGHT(binNum, LEN(binNum) - PATINDEX('%[^0]%',binNum) + 1); SELECT * FROM binFinal; Result: decNum binNum ----------- ---------------------------- 1 1 2 10 3 11 4 100 5 101 6 110 7 111 8 1000 . . . . . . . . . . . . . . . . . . . . 11219 10101111010011 11220 10101111010100 . . . . . . . . . . . . . . . . . . . . 1048573 11111111111111111101 1048574 11111111111111111110 1048575 11111111111111111111

Now it's time to try compressing the generated binary numbers in accordance with the example rules. First of all, create and load a new table similar to binFinal but with one more column that will store compressed data (see Listing 5).

Listing 5. Create and Prepare Table for Data Compression

SET NOCOUNT ON IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.binWithCompr') AND type in (N'U')) DROP TABLE dbo.binWithCompr; CREATE TABLE dbo.binWithCompr( decNum int PRIMARY KEY, originSeq varchar(50), compresSeq varchar(50)) INSERT INTO binWithCompr SELECT decNum, binNum, binNum FROM binFinal; SELECT * FROM dbo.binWithCompr; Result: decNum originSeq compresSeq ----------- ----------------------- -------------- 1 1 1 2 10 10 3 11 11 4 100 100 . . . . . . . . . . . . . . . . . .

Next, generate all the patterns of 1's presented in the binary sequences of the binFinal table (see Listing 6). The patterns can vary from 111 to 111111111111111111111.

Listing 6. Generate Patterns of 1's

DECLARE @maxLen int, @i int; SELECT @maxLen = LEN(originSeq) FROM binWithCompr WHERE decNum = (SELECT MAX(decNum) FROM binWithCompr) IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.patterns') AND type in (N'U')) DROP TABLE dbo.patterns CREATE TABLE dbo.patterns(c1 varchar(100) PRIMARY KEY, decLength int); SET @i = 3; WHILE(@i <= @maxLen) BEGIN INSERT INTO dbo.patterns VALUES(REPLICATE('1', @i), @i); SELECT @i = @i + 1; END SELECT * FROM dbo.patterns; Result: c1 decLength -------------------------------- ------------ 111 3 1111 4 11111 5 111111 6 1111111 7 11111111 8 . . . . . . . . . . . . . . . . . . 1111111111111111111 19 11111111111111111111 20

Next, replace each maximal pattern of 1's you found with its binary equivalent. The entire compression requires two steps:

1. Replace all maximal sequences of n 1's with the decimal version of n, surrounded by '*'.
2. Replace the decimal numbers with their binary equivalents and remove '*'.

Listing 7 describes the first compression step. Listing 8 is the final compression step.