ne of the new namespaces in version 2.0 of the .NET Framework is System.IO.Compression. This new namespace contains two classes for data compression: DeflateStream and GZipStream. Both compression classes support lossless compression and decompression and are designed for dealing with streams.
Tip: For more background information on how the Deflate and GZip algorithms work, refer to http://www.gzip.org/deflate.html.
Compression is useful for reducing the size of data. For example, if you have huge amount of data to store in your SQL database, you can save on disk space if you compress the data before saving it into a table. Moreover, since you are now saving smaller blocks of data into your database, the time spent in performing disk I/O is significantly reduced. The downside of compression is that it takes additional processing power from your machine (and hence requires additional processing time), and you need to factor in this additional time before deciding you want to use compression in your application.
Compression is extremely useful in cases where you need to transmit data over networks, especially slow and costly networks, such as GPRS connections. In such cases, using compression can drastically cut down the data size and reduce the overall cost of communication. Web services is another area where using compression can provide a great advantage since XML data can be highly compressed.
But once you’ve decided the performance cost is worth it, you’ll need help deciphering the utilization of .NET 2.0’s two new compression classes, which is what I’ll attempt to do in this article.
Creating the Sample Application
As I often do in my articles, I’ll build a sample application, which I’ll use to illustrate the use of compression. The application allows you to compress files as well as plain text. You will then be able to reuse the code snippets in this example for your own application.
Using Visual Studio 2005, create a new Windows application and populate the default Form1 with the following controls (see also Figure 1):
- GroupBox controls
- RadioButton controls
- TextBox controls
- Button controls
- Label controls
|Figure 1. Populate the Form: Populate the default Form1 with all the controls shown.|
Switch to the code-behind of Form1 and import the following namespaces:
Imports System.IOImports System.IO.Compression
Before you start to use the compression classes, it is important to understand how they work. The compression classes read data (to be compressed) from a byte array, compress it and store the results into a stream object. For decompression, the compressed data stored in a stream object is decompressed and then stored in another stream object.
First, define the Compress() function, which takes in two parameters: algo and data. The first parameter specifies which algorithm to use (GZip or Deflate) and the second parameter is a byte array that contains the data to compress. A memory stream object will be used to store the compressed data. Once the compression is done, you need to calculate the compression ratio, which is done by dividing the size of the compressed data by the size of the uncompressed data.
|Author’s Note: This compression ratio is sometime expressed as a percentage. For example, if the uncompressed data size is 10MB and the compressed data size is 5MB, then the compress ratio is 50 percent. For efficient compression, you should aim for as low a compression ratio as possible. A compression ratio of more than 100 percent is no good?you are worst off than without compression.|
The compressed data stored in the memory stream is then copied onto another byte array and returned to the calling function. In addition, you will also use a StopWatch object to keep track of how much time was used by the compression algorithm. The Compress() function is defined as follows:
Public Function Compress(ByVal algo As String, ByVal data() As Byte) As Byte() Try Dim sw As New Stopwatch '---the ms is used for storing the compressed data--- Dim ms As New MemoryStream() Dim zipStream As Stream = Nothing '---start the stopwatch--- sw.Start() If algo = "Gzip" Then zipStream = New GZipStream(ms, CompressionMode.Compress, True) ElseIf algo = "Deflate" Then zipStream = New DeflateStream(ms, CompressionMode.Compress, True) End If '---compressing using the info stored in data--- zipStream.Write(data, 0, data.Length) zipStream.Close() '---stop the stopwatch--- sw.Stop() '---calculate the compression ratio--- Dim ratio As Single = Math.Round((ms.Length / data.Length) * 100, 2) Dim msg As String = "Original size: " & data.Length & _ ", Compressed size: " & ms.Length & _ ", Compression ratio: " & ratio & "%" & _ ", Time spent: " & sw.ElapsedMilliseconds & "ms" lblMessage.Text = msg ms.Position = 0 '---used to store the compressed data (byte array)--- Dim c_data(ms.Length - 1) As Byte '---read the content of the memory stream into the byte array--- ms.Read(c_data, 0, ms.Length) Return c_data Catch ex As Exception MsgBox(ex.ToString) Return Nothing End Try End Function
The Decompress() function will decompress the data compressed by the Compress() function. The first parameter specifies the algorithm to use. The byte array containing the compressed data is passed in as the second parameter, which is then copied into a memory stream object. The compression classes will then decompress the data stored in the memory stream and then store the decompressed data into another stream object. In order to obtain the decompressed data, you need to read the data from the stream object. This is accomplished by the RetrieveBytesFromStream() function, which I’ll define next.
The Decompress() function is defined as follows:
Public Function Decompress(ByVal algo As String, ByVal data() As Byte) As Byte() Try Dim sw As New Stopwatch '---copy the data (compressed) into ms--- Dim ms As New MemoryStream(data) Dim zipStream As Stream = Nothing '---start the stopwatch--- sw.Start() '---decompressing using data stored in ms--- If algo = "Gzip" Then zipStream = New GZipStream(ms, CompressionMode.Decompress) ElseIf algo = "Deflate" Then zipStream = New DeflateStream(ms, CompressionMode.Decompress, True) End If '---used to store the decompressed data--- Dim dc_data() As Byte '---the decompressed data is stored in zipStream; ' extract them out into a byte array--- dc_data = RetrieveBytesFromStream(zipStream, data.Length) '---stop the stopwatch--- sw.Stop() lblMessage.Text = "Decompression completed. Time spent: " & _ sw.ElapsedMilliseconds & "ms" & _ ", Original size: " & dc_data.Length Return dc_data Catch ex As Exception MsgBox(ex.ToString) Return Nothing End Try End Function
The RetrieveBytesFromStream() function takes in two parameters?a stream object and an integer?and returns a byte array containing the decompressed data. The integer parameter is used to determine how many bytes to read from the stream object into the byte array at a time. This is necessary because when the data is decompressed, you do not know the exact size of the decompressed data in the stream object. And hence it is necessary to dynamically expand the byte array in blocks to hold the decompressed data during runtime. Reserving too large a block wastes memory, while reserving too small a block loses valuable time whilst you continually expand the byte array. It is therefore up to the calling routine to determine the optimal block size to read.
Define the RetrieveBytesFromStream() function as follows:
Public Function RetrieveBytesFromStream( _ ByVal stream As Stream, ByVal bytesblock As Integer) As Byte() '---retrieve the bytes from a stream object--- Dim data() As Byte Dim totalCount As Integer = 0 Try While True '---progressively increase the size of the data byte array--- ReDim Preserve data(totalCount + bytesblock) Dim bytesRead As Integer = stream.Read(data, totalCount, bytesblock) If bytesRead = 0 Then Exit While End If totalCount += bytesRead End While '---make sure the byte array contains exactly the number ' of bytes extracted--- ReDim Preserve data(totalCount - 1) Return data Catch ex As Exception MsgBox(ex.ToString) Return Nothing End Try End Function
Recall that in the Decompress() function, you called the RetrieveBytesFromStream() function with the following:
dc_data = RetrieveBytesFromStream(zipStream, data.Length)
The block size is the size of the compressed data (data.length). In most cases, the uncompressed data is a few times larger than the compressed data (as indicated by the compression ratio), and hence you would at most expand the byte array dynamically during runtime a couple of times. As an example, suppose the compression ratio is 20 percent and the size of the compressed data is 2MB. In this case, the uncompressed data would be 10MB. And hence, the byte array would be expanded dynamically five times. Ideally, the byte array should not be expanded too frequently during runtime as this will severely slow down the application. But using the size of the compressed data as a block size is a good compromise.
Handling Compression Events
Now that the main compression and decompression routines are defined, you can code the event handler for the various buttons. The event handler for the Compress button is as follows:
Private Sub btnCompress_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnCompress.Click '---used to store the compressed data--- Dim compressedData() As Byte '---compress the data--- If rbGZipStream.Checked Then compressedData = Compress("Gzip", _ System.Text.Encoding.ASCII.GetBytes(txtBefore.Text)) Else compressedData = Compress("Deflate", _ System.Text.Encoding.ASCII.GetBytes(txtBefore.Text)) End If '---copy the compressed data into a string for presentation--- Dim i As Integer Dim s As New System.Text.StringBuilder() For i = 0 To compressedData.Length - 1 If i <> compressedData.Length - 1 Then s.Append(compressedData(i) & " ") Else s.Append(compressedData(i)) End If Next '---show the compressed data as a string--- txtAfter.Text = s.ToString End Sub
The data in the txtBefore control is converted into a byte array and then compressed. The compressed data is then converted to string for display in txtAfter.
The event handler for the Decompress button is as follows:
Private Sub btnDecompress_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnDecompress.Click '---format the compressed string into a byte array--- Dim eachbyte() As String = txtAfter.Text.Split(" ") Dim data(eachbyte.Length - 1) As Byte For i As Integer = 0 To eachbyte.Length - 1 data(i) = Convert.ToByte(eachbyte(i)) Next '---decompress the data and shows the decompressed data--- If rbGZipStream.Checked Then txtBefore.Text = System.Text.Encoding.ASCII.GetString( _ Decompress("Gzip", data)) Else txtBefore.Text = System.Text.Encoding.ASCII.GetString( _ Decompress("Deflate", data)) End If End Sub
It converts the data displayed in the txtAfter control into a byte array and then sends it for decompression. The decompressed data is displayed back in the txtBefore control.
The event handler for the Select file to compress button is as follows:
Private Sub btnSelectFile_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnSelectFile.Click '---let user choose a file to compress--- Dim openFileDialog1 As New OpenFileDialog() 'openFileDialog1.InitialDirectory = "c:" openFileDialog1.Filter = "All files (*.*)|*.*" openFileDialog1.RestoreDirectory = True If openFileDialog1.ShowDialog() = Windows.Forms.DialogResult.OK Then '---read the content of the file into the byte array--- Dim fileContents As Byte() fileContents = My.Computer.FileSystem.ReadAllBytes(openFileDialog1.FileName) '---create the gzip file--- Dim filename As String = openFileDialog1.FileName & ".gzip" If File.Exists(filename) Then File.Delete(filename) Dim fs As FileStream = _ New FileStream(filename, FileMode.CreateNew, FileAccess.Write) '---compress the content of file--- Dim compressed_Data As Byte() If rbGZipStream.Checked Then compressed_Data = Compress("Gzip", fileContents) Else compressed_Data = Compress("Deflate", fileContents) End If If compressed_Data IsNot Nothing Then '---write the compressed content into the compressed file--- fs.Write(compressed_Data, 0, compressed_Data.Length) fs.Close() End If End If End Sub
It reads the content of the file selected by the user, compresses it, and creates a new file (with the same file name but with a .gzip extension appended) containing the compressed data.
The event handler for the Select file to decompress button is as follows:
Private Sub btnDecompressFile_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnDecompressFile.Click '---let user choose a file to decompress--- Dim openFileDialog1 As New OpenFileDialog() ' openFileDialog1.InitialDirectory = "c:" openFileDialog1.Filter = "All GZIP files (*.gzip)|*.gzip" openFileDialog1.RestoreDirectory = True If openFileDialog1.ShowDialog() = Windows.Forms.DialogResult.OK Then '---read the content of the compressed file into byte array--- Dim fileContents As Byte() fileContents = My.Computer.FileSystem.ReadAllBytes(openFileDialog1.FileName) '---decompress the content of file--- Dim uncompressed_Data As Byte() If rbGZipStream.Checked Then uncompressed_Data = Decompress("Gzip", fileContents) Else uncompressed_Data = Decompress("Deflat", fileContents) End If '---create the decompressed file--- Dim filename As String = _ openFileDialog1.FileName.Substring( _ 0, openFileDialog1.FileName.Length - 5) If File.Exists(filename) Then File.Delete(filename) Dim fs As FileStream = _ New FileStream( _ filename, _ FileMode.CreateNew, FileAccess.Write) If uncompressed_Data IsNot Nothing Then '---write the decompressed content into the file--- fs.Write(uncompressed_Data, 0, uncompressed_Data.Length) fs.Close() End If End If End Sub
It reads the content of the file selected by the user, decompresses it, and creates a new file (by stripping its .gzip extension) containing the decompressed data.
Testing the Application
Press F5 to test the application (see Figure 2).
|Figure 2. Testing the Application: Select the compression algorithm to use and then you can compress either a text string, or the content of a file.|
You should observe the following:
- Compressing small amount of text will actually result in a larger compressed text.
- Different text will yield different compression ratio, even though the number of characters is constant.
- Text files compress the best; they yield the best compression ratio.
- Other binary files such as .exe, jpg, generally do not compress well and will result in greater than 100 percent% compression ratios, which are worthless.
One important observation is that the implementations of the GZIP and Deflate algorithms in .NET are less efficient (in terms of compression ratios) than other third-party GZIP utilities on the market. While you may be able to compress a 10MB file to 4MB using the .NET classes, you might find that you can get an even smaller compression size using a third-party tool. Also, the compression class cannot work with data larger than 4GB. However, the implementation in .NET will allow you to decompress all files that have been compressed with the other GZIP utilities in the market.
In this article, you have seen how to use compression classes in .NET 2.0. While the implementation is not as efficient as those non-MS solutions in the marketplace, it does provide you with an easy (and free) way to incorporate compression capabilities into your .NET applications!