s a kid, I was fascinated with secret codes. After seeing the movie, "The Miracle Worker" (every kid in the 60s and 70s saw that movie at school, I'm sure), my friends and I learned how to use the American Sign Language
finger spelling and spent months sending secret messages across the classroom while the teacher wasn't looking. For a while, my friends in religious school transliterated English words using the Hebrew alphabet as our own secret code. Yes, it's truewe were truly the geeks of the 60s. It's a tough job, too, because the Hebrew alphabet doesn't quite match up to the English alphabet, but nothing deterred the spies in training.
Maybe this is why hashing algorithms have always intrigued me. Back in the 80s, when I was teaching Data Structures using Pascal at Harvard Extensionteaching adults is way easier than teaching teenagersI loved it when we got to the hashing section. I even love the name (yummmm... "hash"; you'll just have to imagine that Homer Simpson voice I hear in my head). In case you've missed the fun, hashing takes a piece of data and converts it to some other form, normally an array of bytes or simply an integer, in such a way that the process is deterministic. That is, if you hash a given value multiple times, you always get the same output. On the other hand, hashing also provides no return route. Because two values might result in the same hash value, there's no way to determine the original data given its hashed value.
In my .NET experience, developers use hashing for two very different purposes. First, the concept of hash searching comes into play when you have a number of objects you'd like to store somewhere in memory, and later be able to retrieve one of the objects given a particular piece of information about the object (somewhat like a primary key value, when you store data into a database table). This concept plays out in the use of the HashTable data structure, and the GetHashCode
method supplied by the base Object type (and therefore, each and every object in the .NET Framework). Developers also use hashing in order to store encrypted values for comparison, such as passwords. For example, you might store a hashed password value in a database table, and when asked to validate a user, compare the hashed password against the hashed text the user has entered. If the values match, you know you've got the correct password. (This explains why many Web sites are unable to send you your current password; the best they can do is reset your password for you. Because they're only storing a hashed version of your password, they have no mechanism for sending you the original value. They can't retrieve it.)
The base Object class provides the GetHashCode
method and most classes override its behavior with some value that's meaningful for that class' data. The .NET documentation is pretty clear on how this method works. It indicates that for any class, the GetHashCode
method must have the following characteristics (and I quote):
- If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
- For the best performance, a hash function must generate a random distribution for all input.
- The hash function must return exactly the same value regardless of any changes that are made to the object.
method provided by the String class, for example, returns an integer that represents some manipulation of the text stored within the string. The following code shows off the feature:
Dim x As String = "Hello"
' Displays 222703111
x = "Goodbye"
' Displays -1207123720
x = "Hello"
' Displays 222703111
method for a more complex type might base the hash value off of one or more fields or properties of an instance, or might provide a hash based on the address of the object.
Developers use hash tables for all sorts of purposes. For example, I recently got a call from a friend with this problem: He needed to be able to parse a text file, he needed to keep track of all the words in the file, and he needed to know how many times each word appeared. (This isn't a new problem. This sort of text concordance is something every developer in school has written.) Hash tables are perfect for this sort of operation.
You can think of a hash table as an array, containing buckets in which you store data. The data structure uses the hash value (that is, the integer 222703111 for the string "Hello") modulo the size of the hash table to determine the position in the table that the value will be stored. If you're writing the hash table algorithm yourself, you would need to worry about how the calculations are done, and you'd also need to consider what happens when two values "hash" to the same integer value and cause a collisionand that's sure to happen sooner or laterbut when you use the System.Collections.HashTable data structure, you needn't worry about such details; the .NET Framework handles all these issues. You add a value to the hash table using the Add method, specifying the key to be hashed along with the data to be stored. You can call the Contains method to determine if a specific key has already been added. Because the HashTable class implements the IEnumerable interface, you can iterate through the collection. Each item within the hash table is a DictionaryEntry type, and you can retrieve the Key
property of the DictionaryEntry to get back the data you stored.