RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Back to Basics: Manage Collections with a Custom Dictionary : Page 3

Dictionaries are an essential data structure in many of today's programming environments. Use this one in your next Brew application.

Inserting Items in the Dictionary
Inserting an item in a dictionary is more difficult than inserting it in a vector, because not only do you have to resize the dictionary when it becomes full, you also have to calculate where the item is going to reside. This involves hashing the key for the item as well as probing in the event of collisions.

While this article takes a bottom-up approach to writing the insertion function Dictionary_ReplaceAt, it's easier to understand looking at it from the top down, walking through the different operations it has to perform depending on the state of the dictionary (see Listing 3).

Begin with the usual error checking: validating the incoming instance and other arguments. NULL or zero-length keys are not premitted, either; it makes no sense to the dictionary. Next, grow the dictionary if it needs more space and delete any existing item at the same location before inserting a new node. This code isn't as efficient as it could be in the name of clarity, because Dictionary_Delete computes the hash for the object, something you'll do again only a few lines later. But insertions are more expensive than deletions, and this dictionary, like most, is tuned for fast fetches, rather than insertions (for a data structure that better balances the cost of both insertions and fetches, consider a balanced tree). Once the node is built up on the stack, insert it in the collection using Dictionary_Insert, which creates the new node based on the node you pass.

Growing the collection of items is a little trickier than with a vector, for two reasons. First, you don't want to wait until the collection is full—to do so would create lots of collisions as the dictionary fills. Instead, you want to grow the collection when it's a little less than full. Most agree that a good threshold is at 75 percent of capacity:

static __inline boolean Dictionary_NeedsMore( CDictionary *pMe )
  return (boolean) ( ( 4 * pMe->consumed ) / pMe->maskElements > 3 ); 
The second challenge is that when the collection grows, the location of each item changes, because its location is a function of the hash value. Consequently, when re-sizing the collection, all of the items must be re-inserted. This is very expensive, and while it doesn't occur very often, is one of the known performance penalties when using a hash table to store a dictionary (see Listing 4).

The first thing to note about how I'm growing the underlying collection is that I always double the size of the collection—that is, the size of the container for the collection is always a power of two. I do this for two reasons; first, many hash functions expect this, so this allows me to try different hash functions when I was initially developing the implementation. Second, it makes it exponentially less likely that another reallocation will be required. This model works well for the typical use of a dictionary, in which the client creates a dictionary, stashes away a bunch of name-value pairs, and then spends most of the time just querying the dictionary. The code allocates the new store, and if successful, iterates over the old store and re-inserts each existing node into the store, using the same insertion routine that Dictionary_ReplaceAt does.

Insertion is trickier, too, because if the hash function collides, you have to walk sequentially through the remaining nodes and wrap back around the beginning of the collection to find an empty slot (see Listing 5).

The code begins by computing the ideal location for the item based on its hash value. This operation explains why you use a bit mask instead of a count to store the number of elements in the collection; you can compute the appropriate position using a binary-AND, as opposed to the more expensive modulo. This is less important for insertion performance than it is for search performance, but you do the same in each case. Then, walk forward from that point on until you find a slot with no key; if you don't find one probing forward, begin again at the beginning of the collection and walk until you've exhausted all elements (you're guaranteed to find an empty element at some point during this search, because you reallocate the array when it's only 75 percent full).

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date