devxlogo

Use Specialty Arrays to Accelerate Your Code

Use Specialty Arrays to Accelerate Your Code

was surfing the web the other day, looking for Visual Basic questions to answer, when I stumbled across an old FAQ that explained why Visual Basic no longer has arrays with non-zero lower bounds. I don’t really buy the explanation given by Paul Vick at (it seems like the needs of the none outweighing the needs of the many to me) but it did start me thinking about arrays.

Visual Studio provides arrays of any data type. You can make one-, two-, and higher-dimensional arrays of integers, strings, objects, structures, or whatever. But there are a couple of useful kinds of arrays that Visual Studio doesn’t provide. This article explains how you can implement three of these: triangular arrays, sparse arrays, and arrays with non-zero lower bounds.

When you create a typical array, Visual Studio allocates memory for a block of items that includes one entry for every possible combination of indexes. For example, the following code allocates an array of Booleans containing 1 million entries:

   Dim is_connected(1000, 1000) As Boolean

In some applications, you don’t really need all these entries. For example, suppose the application tests connectivity in an electric, airline, or other network and the method is_connected(R, C) returns true if there is a link connecting location R with location C. Assuming the network is non-directional (so that a link between R and C also means there is a link between C and R), then this array contains a lot of duplicated information. If is_connected(R, C) is true then is_connected(C, R) is also true.

You can avoid storing all of this duplicated information and save half of the array’s space by using a triangular array. A triangular array stores only the values for is_connected(R, C) where R >= C. It deduces the values of other entries by switching the order of R and C.

The TriangularArray class described here saves this space by mapping the entries in the array into a one-dimensional storage array. Figure 1 shows a small triangular array on top and the linear storage of the array’s values on the bottom.

?
Figure 1. Triangle Tricks: The TriangularArray class stores entries for a triangular array (top) in a linear storage array (bottom).

The only tricky part is figuring out the index in the storage array that corresponds to an entry in the triangular array.

Consider an item in array position (R, C), where “R” is the row and “C” is the column. The rows above this one contain 1 + 2 + 3 + … + R entries. You may recall that there is a simple formula for this kind of sum: 1 + 2 + … + R = R * (R + 1) / 2.

Now notice that the number of items to the left of this one in its row is C, so the total number of items that come before this one is R * (R + 1) / 2 + C.

For example, consider item E in Figure 1. This item is at position (2, 1) in the triangular array (be sure to start counting from index 0), so its position in the storage array is 2 * (2 + 1) / 2 + 1 = 3 + 1 = 4, which you can verify in Figure 1.

The following code shows a TriangularArray class. I’ve removed some bounds checking code to make the code easier to read:

   Public Class TriangularArray(Of T)      Private m_Width As Integer      Private m_Values() As T         ' Allocate storage.      Public Sub New(ByVal width As Integer)         m_Width = width         ReDim m_Values(m_Width * (m_Width + 1)  2 - 1)      End Sub         ' Return the item at position (row, column).      Default Public Property Item( _       ByVal r As Integer, ByVal c As Integer)         Get            Return m_Values(ItemIndex(r, c))         End Get         Set(ByVal value)            m_Values(ItemIndex(r, c)) = value         End Set      End Property         ' Calculate the item's position in the array.      Private Function ItemIndex(ByVal r As Integer, _       ByVal c As Integer) As Integer         If r >= c Then Return r * (r + 1)  2 + c         Return c * (c + 1)  2 + r      End Function   End Class

The class’s constructor takes the array’s width as a parameter, and allocates enough space to hold all of the array’s items.

The Item property gets or sets an item in the array. It calls the ItemIndex method to convert the item’s row and column numbers into an index in the storage array and gets or sets the appropriate value.

The Default keyword makes Item the default property for the class. That means the main program can treat an object of the class as if it were an array and the “index” passed to the object is sent as a parameter to the property. (In C#, this property is the class indexer.)

The ItemIndex method simply returns R * (R + 1) / 2 + C as described earlier. If R < C, it swaps the roles of R and C.

The sample program TriangularArray (available with the downloadable code for this article in both Visual Basic and C#), demonstrates the class for the array shown in Figure 1. (For extra credit, try to figure out how to modify the class slightly to handle arrays that don’t need to contain items on the diagonal where R = C.)

Sparse Arrays
The TriangularArray class saves roughly half of an array’s space by avoiding storing duplicate values. In some applications, an array may contain very few values other than some default.

For example, suppose your application deals with connectivity in an airline network connecting 100 cities. The array entry is_connected(R, C) is true if there is a flight between city R and city C. You probably don’t have flights connecting all the 10,000 possible pairs of cities. In fact, you might only fly 200 flights between the cities; customers use connecting flights to go from city to city when no nonstop flight is available.

In this example, the array contains only 200 of 10,000 possible entries. Storing this data in a triangular array would save space but it would still contain 5,000 entries, 96 percent of which would be false.

You can make this array more efficient by using a sparse array. The idea is to store only the entries that have a non-default value. Whenever you need to look up an entry, you assume that any missing item has the default value.

Classically, you would build a sparse array with a linked data structure. A Row class contains a row number, a link to the next row, and a link to the first item in the row. The Item class contains a column number, the data for that entry, and a link to the next entry in the row.

To find an item, you search through the linked list of row objects until you find the row number you want. Then you search through that row’s linked list of items until you find the column number you want and then you return that item’s data. If you can’t find the row, or can’t find the item in the row, you return the default value.

To set an item’s value, you first try to find it in same way. If the item isn’t present, you add it (possibly creating a new row if the item’s row wasn’t there). Then you set the item’s value.

Finally, to set an item to the default value, you find it and remove it from the data structure.

All of this is an interesting character-building exercise and I encourage you to give it a try in your spare time, but the .NET Framework’s Dictionary class gives you a much simpler (easier to understand and debug) alternative. The following code shows a SparseArray class that uses a Dictionary object:

   Public Class SparseArray(Of T As IComparable)      Private m_Dict As Dictionary(Of String, T)      Private m_Default As T         Public Sub New(ByVal default_value As T)         m_Dict = New Dictionary(Of String, T)         m_Default = default_value      End Sub         ' Get or set this item.      Default Public Property Item( _       ByVal r As Integer, ByVal c As Integer) As T         Get            Dim key As String = r & "," & c            If m_Dict.ContainsKey(key) Then Return m_Dict(key)            Return m_Default         End Get         Set(ByVal value As T)            Dim key As String = r & "," & c            If value.Equals(m_Default) Then               m_Dict.Remove(key)            Else               m_Dict(key) = value            End If         End Set      End Property   End Class

This is a generic class based on a type T that implements the IComparable interface. The class starts by defining a generic Dictionary object that uses strings for keys and stores objects of type T. The class’s constructor creates the Dictionary and saves the sparse array’s default value.

The Item property uses the Dictionary to get and set values. The Get procedure composes a string key for the item and determines whether the item is in the Dictionary. If the item is present, the procedure returns its value. If the item is not in the Dictionary, the procedure returns the default value.

The Set procedure also composes a key for the item. Then, if the item’s new value equals the default value (T must implement IComparable, so the test will always work), the procedure removes the item from the dictionary. If the new value is not the default value, the procedure sets the new value in the Dictionary.

The SparseArray program in the downloadable code demonstrates the SparseArray class.

Non-Zero Lower Bounds
Once upon a time, Visual Basic allowed you to create arrays with non-zero lower bounds. That way, if you wanted to store decimal sales figures for the years 1991 through 2010, you could create an array with corresponding bounds like this:

   Dim sales(1991 To 2010) As Decimal

Unfortunately in Visual Studio .NET (both C# and Visual Basic), all arrays use zero for their lower bounds. (As a largely symbolic concession, Microsoft now allows you to explicitly declare the lower bound in Visual Basic as long as it’s zero. For example, you can write: Dim people(0 To 100)?but you still can’t have non-zero lower bounds.)

Although Visual Studio doesn’t allow arrays with non-zero lower bounds, you can work around that restriction with a little simple subtraction.

For example, suppose you want to build an array of sales figures for the years 1991 through 2010. You can save the values in a zero-based array and subtract 1991 from the indexes you use to get and fetch values. For example, you would place the value for the year 2000 in array entry 2000?1991 = 9. The following code makes an array big enough to hold the values and then places the value 32,000 in the entry for 2000:

   Dim sales(2010 - 1991) As Decimal   sales(2000 -- 1991) = 32000

Unfortunately the extra subtraction makes your code slightly harder to read. The following OneDArray class wraps up the details to protect those who are math phobic. Notice that the class is generic, so you can use it to make an array of strings, decimals, pickles, politicians, or whatever (error-checking code that verifies bounds and indexes omitted for brevity):

   Public Class OneDArray(Of T)      Private m_LowerBound As Integer      Private m_UpperBound As Integer      Private m_Values() As T         Public Sub New(ByVal lower_bound As Integer, _         ByVal upper_bound As Integer)         ' Save the bounds.         m_LowerBound = lower_bound         m_UpperBound = upper_bound            ' Create the storage array.         ReDim m_Values(m_UpperBound - m_LowerBound)      End Sub         Default Public Property Item(ByVal index As Integer) As T         Get            Return m_Values(index - m_LowerBound)         End Get         Set(ByVal Value As T)            m_Values(index - m_LowerBound) = Value         End Set      End Property   End Class

The constructor saves the desired upper and lower bounds and makes a normal array to hold the values. The Item property subtracts the lower bound from the index to find the desired entry in the value array.

The Default keyword added to the property definition makes this the default property (the class indexer in C#) for the class. Again, that means a program can treat an object of the class as if it were an array and the “index” passed to the object is sent as a parameter to the property.

The following code shows the previous example rewritten to use the OneDArray class. Notice how the second line treats the sales object as if it were an array:

   Dim sales As New OneDArray(Of Integer)(1991, 2010)   sales(2000) = 32000

This idea is easy to generalize for higher-dimensional arrays. You simply save the bounds for all dimensions and subtract them from the indexes used to get and set values in the array. The following code shows the key parts of the TwoDArray class, which implements a two-dimensional array with non-zero lower bounds:

   Public Class TwoDArray(Of T)      Private m_LowerBound0 As Integer      Private m_LowerBound1 As Integer      Private m_UpperBound0 As Integer      Private m_UpperBound1 As Integer      Private m_Values(,) As T         Public Sub New( _       ByVal lower_bound0 As Integer, _       ByVal upper_bound0 As Integer, _       ByVal lower_bound1 As Integer, _       ByVal upper_bound1 As Integer)         ' Save the bounds.         m_LowerBound0 = lower_bound0         m_LowerBound1 = lower_bound1         m_UpperBound0 = upper_bound0         m_UpperBound1 = upper_bound1            ' Create the two-dimensional storage array.         ReDim m_Values( _            m_UpperBound0 - m_LowerBound0, _            m_UpperBound1 - m_LowerBound1)      End Sub         Default Public Property Item( _       ByVal index0 As Integer, ByVal index1 As Integer) As T         Get            Return m_Values(index0 - m_LowerBound0, _               index1 - m_LowerBound1)         End Get         Set(ByVal Value As T)            m_Values(index0 - m_LowerBound0, _               index1 - m_LowerBound1) = Value         End Set      End Property   End Class

As a usage example, suppose you want to save sales figures for each of the four quarters in the years 1991 through 2010. The following code creates a TwoDArray object to represent the year in its first dimension and the fiscal quarter in the second. It then sets the third quarter sales figure for the year 2000 to 8,000:

   Dim sales As New TwoDArray(Of Decimal)(1991, 2010, 1, 4)   sales(2000, 3) = 8000

You can extend this idea in a similar manner to accommodate three, four, or more dimensions. Unfortunately, subtraction alone isn’t enough to make a single general class that can handle any number of dimensions the way the Array class can. To do that, you need to use some higher mathematics: multiplication and addition.

Moving to Higher Dimensions
Before you see the code for this class, you should understand how to lay out a multi-dimensional array in memory. (In fact, this is the same way compilers do it.) For now, assume all of the dimensions’ lower bounds are zero and we’ll remove that restriction later.

This example lays out its values in row-major order. That means all the elements in the early dimensions are grouped together in memory. This makes the most sense if you think of a two-dimensional array laid out in a one-dimensional stretch of memory as shown in Figure 2. The top of the figure shows the array in its two-dimensional logical form. The bottom shows the array’s items placed in a one-dimensional memory.

?
Figure 2. Two for One: The two-dimensional array at the top of the figure is laid out in at the bottom in one-dimensional memory, in row-major order.

In Figure 2, you can see that the items in each of the array’s rows are adjacent to each other in memory.

To calculate the location of an item in the one-dimensional memory, you multiply the item’s row number (starting at zero) by the number of items in a row, and then add the item’s column number (again starting with zero). The array in Figure 2 has four items in each row so, for example, the item G (at row 1 column 2) has memory location 1 * 4 + 2 = 6. You can verify this by counting the entries at the bottom (be sure to start counting from zero.)

Now consider the three-dimensional case shown in Figure 3. The top shows a three-dimensional 3 x 2 x 2 array where the dimensions sequentially run left-to-right, top-to-bottom, and front-to-back. For example, item F is at position (2, 1, 0).

?
Figure 3. Higher Dimensions: Items in higher-dimensional arrays are packed first in one-dimensional groups and then in higher-dimensional groups.

In the one-dimensional representation at the bottom of Figure 3, items are first placed in one-dimensional groups. Those are combined to form two-dimensional groups, which are combined in turn for form three-dimensional groups. For higher-dimensional arrays, the process would continue.

Now you can generalize the method used earlier for finding an item in a two-dimensional array to this higher-dimensional version. The idea is to determine the number of each type of grouping that comes before an item in memory.

For example, consider item L in the lower representation in Figure 3. To the immediate left of item L, there are two single items J and K. To the left of those items is a one-dimensional grouping containing items G through I. To the left of that group is a two-dimensional grouping containing items A through F. To find the location of item L, you add up the sizes of these groupings to the left: 2 * 1 + 1 * 3 + 1 * 6 = 11. You can verify that this is the location of item L in the lower part of Figure 3.

More generally, suppose you are working with an N-dimensional array where the dimensions have lengths L1, L2, and so forth. (In Figure 3, the array’s dimensions are L1 = 3, L2 = 2, and L3 = 2.) Knowing the dimension lengths, you can calculate the sizes of the various groupings. These sizes are L1 for a one-dimensional grouping, L1 * L2 for a two-dimensional grouping, L1 * L2 * L3 for a three-dimensional grouping, and so forth. (In Figure 3, the sizes are 3, 3 * 2 = 6, and 3 * 2 * 2 = 12.)

Now suppose an item has indexes I1, I2, I3, and so forth. Then the layout has I1 of the smallest groupings before it in the memory representation, I2 of the second smallest groupings before it, and so forth.

In the previous example, item L has indexes (2, 1, 1) so it comes after two of the smallest groupings (one item each), one of the next largest grouping (three items), and one of the largest grouping (six items) for a total of 2 * 1 + 1 * 3 + 1 * 6 = 11 items before item L.

This all seems rather complex but the MultiDArray class shown in Listing 1 is actually relatively short and simple if you can keep the general ideas straight in your mind. (Again, I have removed bounds checking code to keep the code simple.) The class uses private variables to record the array’s lower bounds and the length of each dimension. The m_SkipFactors array contains the sizes of the various groupings in order from largest to smallest so the program can easily calculate an item’s index. The m_Values array holds the data.

The class’s constructor saves the lower bounds and dimension lengths. It then calculates the skip factors. The first skip factor is 1. Others are the sizes of the groupings?in increasing order so they match up with the indexes of an item. For the previous example, the skip factors are 1, 3, and 6. The constructor finishes by creating a one-dimensional array to hold all the values.

The Item property takes as parameters the indexes of the item that should be get or set. Visual Basic doesn’t allow you to use the Default keyword on a property with all optional parameters so index0 is declared separately. (This isn’t a restriction in C# so the code is a little more elegant.) The Item property calls the ItemIndex method to calculate the index of the item in the storage array and uses that index to get or set the value.

The ItemIndex method first calculates the position of an item in the storage array. It copies the indexes into a single array to make working with them easier and then loops through the indexes. For each index, it subtracts that dimension’s lower bound (here’s where we get back to non-zero lower bounds), multiplies by the dimension’s skip factor, and adds the result to the total offset into the storage array.

The following code how a program can use the MultiDArray class to build an array similar to the one shown in Figure 3:

?
Figure 4. Time Trials: With zero lower bounds, nothing beats a normal array. But for non-zero lower bounds, the OneDArray, TwoDArray, and MultiDArray classes leave the Array class in the dust.
   Dim lower_bounds() As Integer = {0, 0, 0}   Dim lengths() As Integer = {3, 2, 2}   Dim arr As New MultiDArray( _      Of String)(lower_bounds, lengths)   arr(0, 0, 0) = "A"   arr(1, 0, 0) = "B"   arr(2, 0, 0) = "C"   arr(0, 1, 0) = "D"   arr(1, 1, 0) = "E"   arr(2, 1, 0) = "F"   arr(0, 0, 1) = "G"   arr(1, 0, 1) = "H"   arr(2, 0, 1) = "I"   arr(0, 1, 1) = "J"   arr(1, 1, 1) = "K"   arr(2, 1, 1) = "L"

In the sample code the MultiDArray program example demonstrates the OneDArray, TwoDArray, and MultiDArray classes. Figure 4 shows the program in action. As you can see, a normal array is much faster than any other solution if you can use zero lower bounds. But if you need non-zero lower bounds, the Array class compares pretty badly to the OneDArray, TwoDArray, and MultiDArray classes. The more specialized OneDArray and TwoDArray classes are considerably faster than MultiDArray but they don’t have as much flexibility. If your project requirements change so you suddenly need a seventeen-dimensional array, you can immediately plug in a MultiDArray class, and later, if there’s a need, you can build a customized array class just to handle this odd case.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist