RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Back to Basics: Manage Collections with a Custom Dictionary

Dictionaries are an essential data structure in many of today's programming environments. Use this one in your next Brew application.

n Part 1 of this series, I showed you an elementary vector implementation for Brew as a stand-alone component you can use in your applications. This time, I give you a dictionary with a similar interface you can use in your Brew applications. You can download both the vector and hash here.

Introducing the Dictionary
The dictionary, also known as the associative array or hash, is a data structure that provides an interface for storing key-value associations. Typically, the name is a string, and the value may be anything, although in languages like C, you're hard-pressed to store multiple data types in a single dictionary unless you use unions with a type identifier so you know the type of a specific entry.

Most dictionaries are based on a vector and a hash function that is responsible for helping determine where each item should be inserted in the vector by operating on the item's key. For performance, my implementation does not reuse the vector implementation I presented previously, because the storage characteristics of the dictionary (more on this later) are different than your garden-variety vector. This implementation uses a collection of nodes each containing the name and item being stored, along with its hash for fast comparison:

typedef struct Node
  uint32 hash;
  char *szKey;
  void *pValue;
} Node;
Choosing a good hash function is tricky, and there's a lot of fact as well as folklore on the Internet. Theoretically, the problem is simple: you want a function that distributes the keys across the entire range of the collection's indexes in relatively few operations. In practice, this is quite difficult, and much has been written on the topic. I began with a naïve implementation that worked for testing, and after some research, selected the Jenkins one-at-a-time hash. Of course, even good hash functions occasionally return the same value for two different inputs (that is, the inputs collide). When that happens, you need a tie-breaker to decide which item should be stored where. For simplicity, I use linear probing (walking to the next empty slot) to find the next available spot; the alternative, keeping items with the same hash in the same location chained via a linked list, just seemed more work than it was worth to implement.

Author's Note: If you're interested in learning more about selecting hash functions, I encourage you to read the resources cited in Related Resources.

I should remark that Brew provides a similar component, the IRamCache component, based on the IRecordStore implementation. However, it uses a singly-linked list for its implementation, meaning that the search time for different elements can be grossly different. Moreover, it copies keys and values, making it impossible to store entries such as pointers to other Brew interfaces.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date