Browse DevX
Sign up for e-mail newsletters from DevX


Bulky Data Is No Problem Thanks to Compression/Decompression in .NET 2.0

If you never need to use compression for your applications, consider yourself lucky. For the rest of us, the good news is that .NET 2.0 has two new classes to handle compression and decompression streams. Find out when, and how, to use these valuable facilities.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

ne of the new namespaces in version 2.0 of the .NET Framework is System.IO.Compression. This new namespace contains two classes for data compression: DeflateStream and GZipStream. Both compression classes support lossless compression and decompression and are designed for dealing with streams.

Tip: For more background information on how the Deflate and GZip algorithms work, refer to http://www.gzip.org/deflate.html.

Compression is useful for reducing the size of data. For example, if you have huge amount of data to store in your SQL database, you can save on disk space if you compress the data before saving it into a table. Moreover, since you are now saving smaller blocks of data into your database, the time spent in performing disk I/O is significantly reduced. The downside of compression is that it takes additional processing power from your machine (and hence requires additional processing time), and you need to factor in this additional time before deciding you want to use compression in your application.

Compression is extremely useful in cases where you need to transmit data over networks, especially slow and costly networks, such as GPRS connections. In such cases, using compression can drastically cut down the data size and reduce the overall cost of communication. Web services is another area where using compression can provide a great advantage since XML data can be highly compressed.

But once you've decided the performance cost is worth it, you'll need help deciphering the utilization of .NET 2.0's two new compression classes, which is what I'll attempt to do in this article.

Creating the Sample Application
As I often do in my articles, I'll build a sample application, which I'll use to illustrate the use of compression. The application allows you to compress files as well as plain text. You will then be able to reuse the code snippets in this example for your own application.

Using Visual Studio 2005, create a new Windows application and populate the default Form1 with the following controls (see also Figure 1):

  • GroupBox controls
  • RadioButton controls
  • TextBox controls
  • Button controls
  • Label controls

Figure 1. Populate the Form: Populate the default Form1 with all the controls shown.
Switch to the code-behind of Form1 and import the following namespaces:

Imports System.IO Imports System.IO.Compression

Before you start to use the compression classes, it is important to understand how they work. The compression classes read data (to be compressed) from a byte array, compress it and store the results into a stream object. For decompression, the compressed data stored in a stream object is decompressed and then stored in another stream object.

First, define the Compress() function, which takes in two parameters: algo and data. The first parameter specifies which algorithm to use (GZip or Deflate) and the second parameter is a byte array that contains the data to compress. A memory stream object will be used to store the compressed data. Once the compression is done, you need to calculate the compression ratio, which is done by dividing the size of the compressed data by the size of the uncompressed data.

Author's Note: This compression ratio is sometime expressed as a percentage. For example, if the uncompressed data size is 10MB and the compressed data size is 5MB, then the compress ratio is 50 percent. For efficient compression, you should aim for as low a compression ratio as possible. A compression ratio of more than 100 percent is no good—you are worst off than without compression.

The compressed data stored in the memory stream is then copied onto another byte array and returned to the calling function. In addition, you will also use a StopWatch object to keep track of how much time was used by the compression algorithm. The Compress() function is defined as follows:

Public Function Compress(ByVal algo As String, ByVal data() As Byte) As Byte() Try Dim sw As New Stopwatch '---the ms is used for storing the compressed data--- Dim ms As New MemoryStream() Dim zipStream As Stream = Nothing '---start the stopwatch--- sw.Start() If algo = "Gzip" Then zipStream = New GZipStream(ms, CompressionMode.Compress, True) ElseIf algo = "Deflate" Then zipStream = New DeflateStream(ms, CompressionMode.Compress, True) End If '---compressing using the info stored in data--- zipStream.Write(data, 0, data.Length) zipStream.Close() '---stop the stopwatch--- sw.Stop() '---calculate the compression ratio--- Dim ratio As Single = Math.Round((ms.Length / data.Length) * 100, 2) Dim msg As String = "Original size: " & data.Length & _ ", Compressed size: " & ms.Length & _ ", Compression ratio: " & ratio & "%" & _ ", Time spent: " & sw.ElapsedMilliseconds & "ms" lblMessage.Text = msg ms.Position = 0 '---used to store the compressed data (byte array)--- Dim c_data(ms.Length - 1) As Byte '---read the content of the memory stream into the byte array--- ms.Read(c_data, 0, ms.Length) Return c_data Catch ex As Exception MsgBox(ex.ToString) Return Nothing End Try End Function

The Decompress() function will decompress the data compressed by the Compress() function. The first parameter specifies the algorithm to use. The byte array containing the compressed data is passed in as the second parameter, which is then copied into a memory stream object. The compression classes will then decompress the data stored in the memory stream and then store the decompressed data into another stream object. In order to obtain the decompressed data, you need to read the data from the stream object. This is accomplished by the RetrieveBytesFromStream() function, which I'll define next.

The Decompress() function is defined as follows:

Public Function Decompress(ByVal algo As String, ByVal data() As Byte) As Byte() Try Dim sw As New Stopwatch '---copy the data (compressed) into ms--- Dim ms As New MemoryStream(data) Dim zipStream As Stream = Nothing '---start the stopwatch--- sw.Start() '---decompressing using data stored in ms--- If algo = "Gzip" Then zipStream = New GZipStream(ms, CompressionMode.Decompress) ElseIf algo = "Deflate" Then zipStream = New DeflateStream(ms, CompressionMode.Decompress, True) End If '---used to store the decompressed data--- Dim dc_data() As Byte '---the decompressed data is stored in zipStream; ' extract them out into a byte array--- dc_data = RetrieveBytesFromStream(zipStream, data.Length) '---stop the stopwatch--- sw.Stop() lblMessage.Text = "Decompression completed. Time spent: " & _ sw.ElapsedMilliseconds & "ms" & _ ", Original size: " & dc_data.Length Return dc_data Catch ex As Exception MsgBox(ex.ToString) Return Nothing End Try End Function

The RetrieveBytesFromStream() function takes in two parameters—a stream object and an integer—and returns a byte array containing the decompressed data. The integer parameter is used to determine how many bytes to read from the stream object into the byte array at a time. This is necessary because when the data is decompressed, you do not know the exact size of the decompressed data in the stream object. And hence it is necessary to dynamically expand the byte array in blocks to hold the decompressed data during runtime. Reserving too large a block wastes memory, while reserving too small a block loses valuable time whilst you continually expand the byte array. It is therefore up to the calling routine to determine the optimal block size to read.

Define the RetrieveBytesFromStream() function as follows:

Public Function RetrieveBytesFromStream( _ ByVal stream As Stream, ByVal bytesblock As Integer) As Byte() '---retrieve the bytes from a stream object--- Dim data() As Byte Dim totalCount As Integer = 0 Try While True '---progressively increase the size of the data byte array--- ReDim Preserve data(totalCount + bytesblock) Dim bytesRead As Integer = stream.Read(data, totalCount, bytesblock) If bytesRead = 0 Then Exit While End If totalCount += bytesRead End While '---make sure the byte array contains exactly the number ' of bytes extracted--- ReDim Preserve data(totalCount - 1) Return data Catch ex As Exception MsgBox(ex.ToString) Return Nothing End Try End Function

Recall that in the Decompress() function, you called the RetrieveBytesFromStream() function with the following:

dc_data = RetrieveBytesFromStream(zipStream, data.Length)

The block size is the size of the compressed data (data.length). In most cases, the uncompressed data is a few times larger than the compressed data (as indicated by the compression ratio), and hence you would at most expand the byte array dynamically during runtime a couple of times. As an example, suppose the compression ratio is 20 percent and the size of the compressed data is 2MB. In this case, the uncompressed data would be 10MB. And hence, the byte array would be expanded dynamically five times. Ideally, the byte array should not be expanded too frequently during runtime as this will severely slow down the application. But using the size of the compressed data as a block size is a good compromise.

Thanks for your registration, follow us on our social networks to keep up-to-date