By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.

MSD RadixSort - An algorithm that can achieve O(N) behavior

' MSD RadixSort. No sort based on comparisons can be faster than O(N log N).
' RadixSort makes no comparisons and can therefore achieve O(N) behavior. To
' do this, it examines keys one byte at a time, counting the number of keys
' that have each possible byte value. The counts are then used to build an
' offset table specifying the sorted order of the keys. It's easiest to work
' backwards from the least significant digit of the keys (LSD Radix),
' since order based on more significant digits is not disturbed by less
' significant digits. Unfortunately, the LSD approach requires padding short
' keys if key length is variable, and guarantees that all digits will be
' examined even if the first 3-4 digits contain all the information needed to
' achieve sorted order. Most significant digit (MSD) RadixSort takes a lot
' more bookkeeping -- the list must repeatedly be split into sublists for each
' value of the last digit processed -- but the pay-off is that only as many
' digits will be examined as are needed. As in QuickSort and MergeSort,
' it's worthwhile to hand off the sublists to InsertionSort when they get
' short enough.
'
' MSD RadixSort is stable and runs in linear O(N) time. It is fairly memory
' intensive, needing space for some extra counting arrays and stack space for
' recursive calls, but still uses less memory than MergeSort. This version is
' set up for strings and would take some work to adapt to other data types.
' Integers and longs could be handled by converting them to strings (and
' limiting the arrays CNT and IND to the ten numerical digit values plus the
' minus sign); this is probably worthwhile only if you have hundreds of
' thousands of integers or longs to sort. Doubles are more of a challenge,
' requiring conversion to an array of bytes by type casting. I have not
' pursued this, since VBA on the Mac does not include the CopyMemory function
' that enables type casting on Windows systems.
'
' NOTE: This version is set up to count byte values from 1 to 127. If your
' keys use (for instance) only lowercase or only uppercase alphabetic
' characters or only numerical digits, you can trim the CNT and IND arrays and
' your sorts will run correspondingly faster.
'
' Reference: P. M. McIlroy, K. Bostic and M. D. McIlroy,
' "Engineering Radix Sort", Computing Systems 6(1):5-27 (1993).
'
' Speed: MSDRadixSortS sorts 500,000 random strings in 28.3 sec; sorts 100186
' library call numbers in 21.3 sec; sorts 25479 dictionary words in 1.8 sec
' (random order), 1.5 sec (presorted) or 1.9 sec (reverse sorted). Timed in
' Excel 2001 on an 800 mhz PowerBook.
'
' Bottom line: complex and best suited to strings, but there's nothing faster
' for really long lists.
' Usage:
Dim S1(L To R) As Strings
Dim B1(1 To nChars) As Byte
Dim P1(L To R) As Long
For I = L To R
S1(I) = GetRandomString()
Next I
StrsToBytes S1, B1, P1 'a routine that stores the strings in 0 terminated
' byte arrays with
'P1() holding pointers to the start of each byte series.
MSDRadixSortS L, R, B1, P1
' CODE:
Sub MSDRadixSortS(N As Long)
Dim CH() As Integer
ReDim CH(1 To N)
'See below for type PILE (stack records)
ReDim STACK(1 To 1000)
StackPtr = 1
With STACK(StackPtr)
.L = 1
.R = N
.D = 0
End With
StackPtr = StackPtr + 1
RadixS CH
End Sub
Sub RadixS(CH() As Integer)
Dim L As Long
Dim R As Long
Dim D As Integer
Dim I As Long
Dim J As Long
Dim TMP As Long
Dim C As Integer
Dim NextC As Integer
Dim CNT(-1 To 127) As Long
Dim IND(-1 To 127) As Long
Dim NuCnt As Long
Dim BigCnt As Long
Dim OldSp As Long
Dim BigSp As Long
While StackPtr > 1
'Pop a PILE off the stack.
StackPtr = StackPtr - 1
'Get left and right limits of sublist and depth of byte to examine
With STACK(StackPtr)
L = .L
R = .R
D = .D
End With
'Sublists of <= 24 keys will be finished by InsertionSort
If R - L > 24 Then
'Clear the count array.
For I = -1 To 127
CNT(I) = 0
Next I
'Get the byte at depth D in each key and count the number of each byte
' value.
For I = L To R
C = CInt(B(P(I) + D))
CH(I) = C
CNT(C) = CNT(C) + 1
Next I
'We will add the counts to create sorted addresses in array IND().
IND(-1) = L
'At the same time we'll push the sublists for each byte value onto the
' stack.
OldSp = StackPtr
'We'll track the biggest sublist and make sure it comes off the stack
' last;
'this ensures the stack will not get more than logarithmically deep.
BigCnt = 0
'Now we build the addresses out of the counts.
For I = 0 To 127
J = I - 1
NuCnt = CNT(J)
IND(I) = IND(J) + NuCnt
'If there is a sublist of keys starting with the current byte value,
' stack it.
If NuCnt > 1 Then
With STACK(StackPtr)
.L = IND(J)
.R = IND(I) - 1
.D = D + 1
End With
StackPtr = StackPtr + 1
'Keep track of the largest count / sublist.
If NuCnt > BigCnt Then
BigCnt = NuCnt
BigSp = StackPtr
End If
End If
Next I
'Swap the biggest sublist down into the stack so it comes off last.
TMP = BigSp
BigSp = OldSp
OldSp = TMP
'Now use the counts to move the pointers to their sorted positions
' based on the
'bytes examined so far; doing this in place this gets a bit ugly.
For I = L To R
'We will use the byte at I to find what address P(I) should be mapped
' to.
C = CH(I)
'We use -1 to flag pointers already moved.
CH(I) = -1
'If C = -1 we skip the loop and increment I until we find a pointer
' not already moved.
Do While C >= 0
'We go to IND(C) to get the destination address for P(I).
J = IND(C)
'We swap the current pointer for the one at that address.
TMP = P(I)
P(I) = P(J)
P(J) = TMP
'Now we determine where to map the pointer we just displaced.
'We get the byte value from its key.
NextC = CH(J)
'We flag it as moved.
CH(J) = -1
'Once we've used each address we increment it unless we've hit R.
If J < R Then IND(C) = J + 1 Else Exit Do
'We set C to the byte of the key of the displaced pointer; ready
' to loop!
C = NextC
Loop
'If a series of displacements circles back to a pointer already moved,
' we
'increment I until we find a pointer not yet moved or until we end at
' R.
Next I
Else
DeepInsertS B, P, L, 1 + R - L, D
End If
Wend
End Sub
Type PILE
L As Long
R As Long
D As Integer
End Type
Dim STACK() As PILE
Dim StackPtr As Integer
Sub DeepInsertS(B() As Byte, P() As Long, L As Long, N As Long, D As Integer)
Dim LP As Long
Dim RP As Long
Dim TMP As Long
Dim I As Long
Dim J As Long
For RP = L + 1 To L + N - 1
TMP = P(RP)
For LP = RP To L + 1 Step -1
I = TMP + D
J = P(LP - 1) + D
Do While B(I) = B(J)
If B(I) = 0 Or B(J) = 0 Then Exit Do
I = I + 1
J = J + 1
Loop
If CInt(B(I)) - CInt(B(J)) < 0 Then P(LP) = P(LP - 1) Else Exit For
Next LP
P(LP) = TMP
Next RP
End Sub