Browse DevX
Sign up for e-mail newsletters from DevX

Tip of the Day
Language: VB5, VB6
Expertise: Advanced
Jan 13, 2003



Building the Right Environment to Support AI, Machine Learning and Deep Learning

MSD RadixSort - An algorithm that can achieve O(N) behavior

' MSD RadixSort.  No sort based on comparisons can be faster than O(N log N).  
' RadixSort makes no comparisons and can therefore achieve O(N) behavior.  To 
' do this, it examines keys one byte at a time, counting the number of keys 
' that have each possible byte value.  The counts are then used to build an 
' offset table specifying the sorted order of the keys.  It's easiest to work 
' backwards from the least significant digit of the keys (LSD Radix),
'  since order based on more significant digits is not disturbed by less 
' significant digits.  Unfortunately, the LSD approach requires padding short 
' keys if key length is variable, and guarantees that all digits will be 
' examined even if the first 3-4 digits contain all the information needed to 
' achieve sorted order.  Most significant digit (MSD) RadixSort takes a lot 
' more bookkeeping -- the list must repeatedly be split into sublists for each 
' value of the last digit processed -- but the pay-off is that only as many 
' digits will be examined as are needed.  As in QuickSort and MergeSort,
'  it's worthwhile to hand off the sublists to InsertionSort when they get 
' short enough.
' MSD RadixSort is stable and runs in linear O(N) time.  It is fairly memory 
' intensive, needing space for some extra counting arrays and stack space for 
' recursive calls, but still uses less memory than MergeSort.  This version is 
' set up for strings and would take some work to adapt to other data types.  
' Integers and longs could be handled by converting them to strings (and 
' limiting the arrays CNT and IND to the ten numerical digit values plus the 
' minus sign); this is probably worthwhile only if you have hundreds of 
' thousands of integers or longs to sort.  Doubles are more of a challenge,
'  requiring conversion to an array of bytes by type casting.  I have not 
' pursued this, since VBA on the Mac does not include the CopyMemory function 
' that enables type casting on Windows systems.  
' NOTE:  This version is set up to count byte values from 1 to 127.  If your 
' keys use (for instance) only lowercase or only uppercase alphabetic 
' characters or only numerical digits, you can trim the CNT and IND arrays  and 
' your sorts will run correspondingly faster.
' Reference:  P. M. McIlroy, K. Bostic and M. D. McIlroy,
'  "Engineering  Radix Sort", Computing Systems 6(1):5-27 (1993).
' Speed:  MSDRadixSortS sorts 500,000 random strings in 28.3 sec; sorts 100186 
' library call numbers in 21.3 sec; sorts 25479 dictionary words in 1.8 sec 
' (random order), 1.5 sec (presorted) or 1.9 sec (reverse sorted).  Timed in 
' Excel 2001 on an 800 mhz PowerBook.
' Bottom line:  complex and best suited to strings, but there's nothing faster 
' for really long lists.

' Usage:  

Dim S1(L To R) As Strings
Dim B1(1 To nChars) As Byte
Dim P1(L To R) As Long

For I = L To R
    S1(I) = GetRandomString()
Next I

StrsToBytes S1, B1, P1    'a routine that stores the strings in 0 terminated 
                          ' byte arrays with
                    'P1() holding pointers to the start of each byte series.

MSDRadixSortS L, R, B1, P1


Sub MSDRadixSortS(N As Long)
    Dim CH() As Integer
    ReDim CH(1 To N)  
    'See below for type PILE (stack records)
    ReDim STACK(1 To 1000)
    StackPtr = 1
    With STACK(StackPtr)
        .L = 1
        .R = N
        .D = 0
    End With
    StackPtr = StackPtr + 1
    RadixS CH
End Sub

Sub RadixS(CH() As Integer)
    Dim L As Long
    Dim R As Long
    Dim D As Integer
    Dim I As Long
    Dim J As Long
    Dim TMP As Long
    Dim C As Integer
    Dim NextC As Integer
    Dim CNT(-1 To 127) As Long
    Dim IND(-1 To 127) As Long
    Dim NuCnt As Long
    Dim BigCnt As Long
    Dim OldSp As Long
    Dim BigSp As Long
    While StackPtr > 1
    'Pop a PILE off the stack.
        StackPtr = StackPtr - 1
    'Get left and right limits of sublist and depth of byte to examine
        With STACK(StackPtr)
            L = .L
            R = .R
            D = .D
        End With
    'Sublists of <= 24 keys will be finished by InsertionSort
        If R - L > 24 Then
        'Clear the count array.
            For I = -1 To 127
                CNT(I) = 0
            Next I
        'Get the byte at depth D in each key and count the number of each byte 
        ' value.
            For I = L To R
                C = CInt(B(P(I) + D))
                CH(I) = C
                CNT(C) = CNT(C) + 1
            Next I
        'We will add the counts to create sorted addresses in array IND().
            IND(-1) = L
        'At the same time we'll push the sublists for each byte value onto the 
        ' stack.
            OldSp = StackPtr
        'We'll track the biggest sublist and make sure it comes off the stack 
        ' last;
        'this ensures the stack will not get more than logarithmically deep.
            BigCnt = 0
        'Now we build the addresses out of the counts.
            For I = 0 To 127
                J = I - 1
                NuCnt = CNT(J)
                IND(I) = IND(J) + NuCnt
         'If there is a sublist of keys starting with the current byte value,
         '  stack it.
                If NuCnt > 1 Then
                    With STACK(StackPtr)
                        .L = IND(J)
                        .R = IND(I) - 1
                        .D = D + 1
                    End With
                    StackPtr = StackPtr + 1
             'Keep track of the largest count / sublist.
                    If NuCnt > BigCnt Then
                        BigCnt = NuCnt
                        BigSp = StackPtr
                    End If
                End If
            Next I
        'Swap the biggest sublist down into the stack so it comes off last.
            TMP = BigSp
            BigSp = OldSp
            OldSp = TMP
        'Now use the counts to move the pointers to their sorted positions 
        ' based on the
        'bytes examined so far; doing this in place this gets a bit ugly.
            For I = L To R
         'We will use the byte at I to find what address P(I) should be mapped 
         ' to.
                C = CH(I)
         'We use -1 to flag pointers already moved.
                CH(I) = -1
         'If C = -1 we skip the loop and increment I until we find a pointer 
         ' not already moved.
                Do While C >= 0
             'We go to IND(C) to get the destination address for P(I).
                    J = IND(C)
             'We swap the current pointer for the one at that address.
                    TMP = P(I)
                    P(I) = P(J)
                    P(J) = TMP
             'Now we determine where to map the pointer we just displaced.
             'We get the byte value from its key.
                    NextC = CH(J)
             'We flag it as moved.
                    CH(J) = -1
             'Once we've used each address we increment it unless we've hit R.
                    If J < R Then IND(C) = J + 1 Else Exit Do
             'We set C to the byte of the key of the displaced pointer; ready 
             ' to loop!
                    C = NextC
         'If a series of displacements circles back to a pointer already moved,
         '  we 
         'increment I until we find a pointer not yet moved or until we end at 
         ' R.
            Next I
            DeepInsertS B, P, L, 1 + R - L, D
        End If
End Sub

    L As Long
    R As Long
    D As Integer
End Type

Dim StackPtr As Integer

Sub DeepInsertS(B() As Byte, P() As Long, L As Long, N As Long, D As Integer)
    Dim LP As Long
    Dim RP As Long
    Dim TMP As Long
    Dim I As Long
    Dim J As Long
    For RP = L + 1 To L + N - 1
        TMP = P(RP)
        For LP = RP To L + 1 Step -1
            I = TMP + D
            J = P(LP - 1) + D
            Do While B(I) = B(J)
                If B(I) = 0 Or B(J) = 0 Then Exit Do
                I = I + 1
                J = J + 1
            If CInt(B(I)) - CInt(B(J)) < 0 Then P(LP) = P(LP - 1) Else Exit For
        Next LP
        P(LP) = TMP
    Next RP
End Sub

David B. Ring
Thanks for your registration, follow us on our social networks to keep up-to-date