teganography, literally "hidden writing," is nowadays most often associated with embedding data in some form of electronic media. Data is hidden by adding or altering insignificant bits of information of a file. For example, an algorithm designed to embed a text message might slightly alter information describing the RGB composition of a pixel for an image file.
illustrates a typical steganography (or stego) application scenario. The application receives the data to hide as inputtext, audio, video, or imageand the file in which data will be hidden, called the cover file
. The stego file
is the result of the process. Although it contains the original cover file data as well as the hidden stenographic information, the stego file is virtually
identical to the cover file.
|Figure 1. Stego Application Scenario: The stego application hides different types of data within a cover file. The resulting stego also contains hidden information, although it is virtually identical to the cover file.|
This article introduces the most common stenography algorithms and techniques. Then, it shows how to design and implement a .NET library to hide text messages in 24-bit bitmapped (.bmp
) files. The sample code includes both a command-line and a GUI application that serve as proof of concept and let you experiment with the techniques discussed.
Algorithms and Techniques
There are three different techniques you can use to hide information in a cover file:
Injection (or insertion)
. Using this technique, you store the data you want to hide in sections of a file that are ignored by the processing application. By doing this you avoid modifying those file bits that are relevant to an end-userleaving the cover file perfectly usable. For example, you can add additional harmless
bytes in an executable or binary file. Because those bytes don't affect the process, the end-user may not even realize that the file contains additional hidden information. However, using an insertion technique changes file size according to the amount of data hidden and therefore, if the file looks unusually large, it may arouse suspicion.
. Using this approach, you replace the least significant bits of information that determine the meaningful content of the original file with new data in a way that causes the least amount of distortion. The main advantage of that technique is that the cover file size does not change
after the execution of the algorithm. On the other hand, the approach has at least two drawbacks. First, the resulting stego file may be adversely affected by quality degradationand that may arouse suspicion. Second, substitution limits the amount of data that you can hide to the number of insignificant bits in the file.
. Unlike injection and substitution, this technique doesn't require an existing cover filethis technique generates a cover file for the sole purpose of hiding the message. The main flaw of the insertion and substitution techniques is that people can compare the stego file with any pre-existing copy of the cover file (which is supposed to be the same
file) and discover differences between the two. You won't have that problem when using a generation approach, because the result is an original
file, and is therefore immune to comparison tests.
Among the substitution techniques, a very popular methodology is the LSB (Least Significant Bit) algorithm, which replaces the least significant bit in some bytes of the cover file to hide a sequence of bytes containing the hidden data. That's usually an effective technique in cases where the LSB substitution doesn't cause significant quality degradation, such as in 24-bit bitmaps.
For example, to hide the letter "a" (ASCII code 97, that is 01100001) inside eight bytes of a cover, you can set the LSB of each byte like this:
The application decoding the cover reads the eight Least Significant Bits of those bytes to re-create the hidden bytethat is 0110001
the letter "a." As you may realize, using this technique let you hide a byte every eight bytes of the cover. Note that there's a fifty percent chance that the bit you're replacing is the same as its replacement, in other words, half the time, the bit doesn't change, which helps to minimize quality degradation.
The sample code uses the LSB algorithm; however, for further researches, you'll find five additional approaches based on different techniques, such as Transform Domain, Spread Spectrum, Statistical method, Distortion, and Cover Generations in the Related Resources section (see the left column) of this article.