RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Compress Binary Messages Using SQL : Page 3

Data compression is a great method for maximizing data storage space and making data communication faster. However, compression and decompression of binary data sometimes can be quite tricky. Learn a few useful data-compression techniques.


Message Decompression

When you have a table with original and compressed values, you can easily decompress data. All you need to do is query the table and find the original message corresponding to what you just received. Listing 9 is an example of message decompression.
Listing 9. Message Decompression

DECLARE @compresSeq VARCHAR(20); SELECT @compresSeq = '10001011'; SELECT * FROM binWithCompr WHERE compresSeq = @compresSeq; Result: decNum originSeq compresSeq ----------- --------------------------------- --------------- 139 10001011 10001011 279 100010111 10001011 491 111101011 10001011 983 1111010111 10001011 18431 100011111111111 10001011 63487 1111011111111111 10001011 1048571 11111111111111111011 10001011

Oops, there's a problem. Seven different original messages can be compressed to the same binary number: 10001011. That means you need to get more information in order to find the correct answer. If you knew the number of 1's in the original message, it would definitely help. Listing 10 shows how you can find the number of 1's and/or 0's in all binary numbers in the table using only one query.

Listing 10. How to Find Number of 1's and/or 0's in Binary Messages

DECLARE @compresSeq VARCHAR(20); SELECT @compresSeq = '10001011'; SELECT COUNT(*) AS num_of_1s, SUBSTRING(t2.originSeq, t1.decNum,1) AS binVvalue, t2.originSeq, t2.compresSeq FROM (SELECT decNum FROM binWithCompr WHERE decNum <= (SELECT MAX(LEN(originSeq)) FROM binWithCompr WHERE compresSeq = @compresSeq)) t1 CROSS JOIN (SELECT originSeq, compresSeq FROM binWithCompr WHERE compresSeq = @compresSeq) t2 WHERE SUBSTRING(t2.originSeq, t1.decNum,1) = '1' GROUP BY SUBSTRING(t2.originSeq, t1.decNum,1), t2.originSeq, t2.compresSeq; Result: num_of_1s binVvalue originSeq compresSeq ----------- -------------- ------------------- -------------------- 7 1 111101011 10001011 12 1 100011111111111 10001011 19 1 11111111111111111011 10001011 8 1 1111010111 10001011 4 1 10001011 10001011 5 1 100010111 10001011 15 1 1111011111111111 10001011

As you can see, the information about the number of 1's in the original message received with the compressed message can help you uniquely identify the original message. But this is not always correct. In some cases, different original messages with the same number of 1's can be converted into the same compressed message. For example, if you run the script from Listing 10 for @compresSeq = '1010101', you will get the following result:

num_of_1s   binValue 	originSeq  	compresSeq
----------- --------- 	------------ -------------
7           1         	111110101    1010101
10          1         	11111011111  1010101
7           1         	101011111    1010101
7           1         	101111101    1010101
4           1         	1010101      1010101

As you can see, three original messages have seven 1's and two 0's, which can be compressed into the message 1010101. Therefore, there is no way to uniquely identify the original message (unless you have some additional information about it). All that means is you should use this type of compression only in situations where some data loss is acceptable (lossy data compression/decompression such as in image processing and transmission).

Message Size Matters

The example uses a hash table I created to make the decompression process easier. For obvious reasons, the described approach can be applied only to relatively short (2-4 bytes) binary messages. The number of binary combinations in 32 bits is 232 = 4,294,967,296. Generating that amount of binary numbers is difficult but doable. Generating all binary combinations for messages that are hundreds or thousands of bits in length, however, is impossible.

Nevertheless, you can use the techniques shown in this article in many data compression implementations and/or when you need to manipulate binary data.

Alex Kozak is a senior DBA/analyst working for SAP Canada. He has more than 15 years of database and programming experience. Microsoft has included some of his articles in the MSDN Library.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date