Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


Tip of the Day
Language: VB5, VB6
Expertise: Advanced
Jan 13, 2003

MSD RadixSort - An algorithm that can achieve O(N) behavior

' MSD RadixSort.  No sort based on comparisons can be faster than O(N log N).  
' RadixSort makes no comparisons and can therefore achieve O(N) behavior.  To 
' do this, it examines keys one byte at a time, counting the number of keys 
' that have each possible byte value.  The counts are then used to build an 
' offset table specifying the sorted order of the keys.  It's easiest to work 
' backwards from the least significant digit of the keys (LSD Radix),
'  since order based on more significant digits is not disturbed by less 
' significant digits.  Unfortunately, the LSD approach requires padding short 
' keys if key length is variable, and guarantees that all digits will be 
' examined even if the first 3-4 digits contain all the information needed to 
' achieve sorted order.  Most significant digit (MSD) RadixSort takes a lot 
' more bookkeeping -- the list must repeatedly be split into sublists for each 
' value of the last digit processed -- but the pay-off is that only as many 
' digits will be examined as are needed.  As in QuickSort and MergeSort,
'  it's worthwhile to hand off the sublists to InsertionSort when they get 
' short enough.
'
' MSD RadixSort is stable and runs in linear O(N) time.  It is fairly memory 
' intensive, needing space for some extra counting arrays and stack space for 
' recursive calls, but still uses less memory than MergeSort.  This version is 
' set up for strings and would take some work to adapt to other data types.  
' Integers and longs could be handled by converting them to strings (and 
' limiting the arrays CNT and IND to the ten numerical digit values plus the 
' minus sign); this is probably worthwhile only if you have hundreds of 
' thousands of integers or longs to sort.  Doubles are more of a challenge,
'  requiring conversion to an array of bytes by type casting.  I have not 
' pursued this, since VBA on the Mac does not include the CopyMemory function 
' that enables type casting on Windows systems.  
'
' NOTE:  This version is set up to count byte values from 1 to 127.  If your 
' keys use (for instance) only lowercase or only uppercase alphabetic 
' characters or only numerical digits, you can trim the CNT and IND arrays  and 
' your sorts will run correspondingly faster.
'
' Reference:  P. M. McIlroy, K. Bostic and M. D. McIlroy,
'  "Engineering  Radix Sort", Computing Systems 6(1):5-27 (1993).
'
' Speed:  MSDRadixSortS sorts 500,000 random strings in 28.3 sec; sorts 100186 
' library call numbers in 21.3 sec; sorts 25479 dictionary words in 1.8 sec 
' (random order), 1.5 sec (presorted) or 1.9 sec (reverse sorted).  Timed in 
' Excel 2001 on an 800 mhz PowerBook.
'
' Bottom line:  complex and best suited to strings, but there's nothing faster 
' for really long lists.

' Usage:  

Dim S1(L To R) As Strings
Dim B1(1 To nChars) As Byte
Dim P1(L To R) As Long

For I = L To R
    S1(I) = GetRandomString()
Next I

StrsToBytes S1, B1, P1    'a routine that stores the strings in 0 terminated 
                          ' byte arrays with
                    'P1() holding pointers to the start of each byte series.

MSDRadixSortS L, R, B1, P1


' CODE:

Sub MSDRadixSortS(N As Long)
    Dim CH() As Integer
    
    ReDim CH(1 To N)  
    'See below for type PILE (stack records)
    ReDim STACK(1 To 1000)
    StackPtr = 1
    With STACK(StackPtr)
        .L = 1
        .R = N
        .D = 0
    End With
    StackPtr = StackPtr + 1
    RadixS CH
End Sub

Sub RadixS(CH() As Integer)
    Dim L As Long
    Dim R As Long
    Dim D As Integer
    Dim I As Long
    Dim J As Long
    Dim TMP As Long
    Dim C As Integer
    Dim NextC As Integer
    Dim CNT(-1 To 127) As Long
    Dim IND(-1 To 127) As Long
    Dim NuCnt As Long
    Dim BigCnt As Long
    Dim OldSp As Long
    Dim BigSp As Long
    
    While StackPtr > 1
    'Pop a PILE off the stack.
        StackPtr = StackPtr - 1
    'Get left and right limits of sublist and depth of byte to examine
        With STACK(StackPtr)
            L = .L
            R = .R
            D = .D
        End With
    'Sublists of <= 24 keys will be finished by InsertionSort
        If R - L > 24 Then
        'Clear the count array.
            For I = -1 To 127
                CNT(I) = 0
            Next I
        'Get the byte at depth D in each key and count the number of each byte 
        ' value.
            For I = L To R
                C = CInt(B(P(I) + D))
                CH(I) = C
                CNT(C) = CNT(C) + 1
            Next I
        'We will add the counts to create sorted addresses in array IND().
            IND(-1) = L
        'At the same time we'll push the sublists for each byte value onto the 
        ' stack.
            OldSp = StackPtr
        'We'll track the biggest sublist and make sure it comes off the stack 
        ' last;
        'this ensures the stack will not get more than logarithmically deep.
            BigCnt = 0
        'Now we build the addresses out of the counts.
            For I = 0 To 127
                J = I - 1
                NuCnt = CNT(J)
                IND(I) = IND(J) + NuCnt
         'If there is a sublist of keys starting with the current byte value,
         '  stack it.
                If NuCnt > 1 Then
                    With STACK(StackPtr)
                        .L = IND(J)
                        .R = IND(I) - 1
                        .D = D + 1
                    End With
                    StackPtr = StackPtr + 1
             'Keep track of the largest count / sublist.
                    If NuCnt > BigCnt Then
                        BigCnt = NuCnt
                        BigSp = StackPtr
                    End If
                End If
            Next I
        'Swap the biggest sublist down into the stack so it comes off last.
            TMP = BigSp
            BigSp = OldSp
            OldSp = TMP
        'Now use the counts to move the pointers to their sorted positions 
        ' based on the
        'bytes examined so far; doing this in place this gets a bit ugly.
            For I = L To R
         'We will use the byte at I to find what address P(I) should be mapped 
         ' to.
                C = CH(I)
         'We use -1 to flag pointers already moved.
                CH(I) = -1
         'If C = -1 we skip the loop and increment I until we find a pointer 
         ' not already moved.
                Do While C >= 0
             'We go to IND(C) to get the destination address for P(I).
                    J = IND(C)
             'We swap the current pointer for the one at that address.
                    TMP = P(I)
                    P(I) = P(J)
                    P(J) = TMP
             'Now we determine where to map the pointer we just displaced.
             'We get the byte value from its key.
                    NextC = CH(J)
             'We flag it as moved.
                    CH(J) = -1
             'Once we've used each address we increment it unless we've hit R.
                    If J < R Then IND(C) = J + 1 Else Exit Do
             'We set C to the byte of the key of the displaced pointer; ready 
             ' to loop!
                    C = NextC
                Loop
         'If a series of displacements circles back to a pointer already moved,
         '  we 
         'increment I until we find a pointer not yet moved or until we end at 
         ' R.
            Next I
        Else
            DeepInsertS B, P, L, 1 + R - L, D
        End If
    Wend
End Sub

Type PILE
    L As Long
    R As Long
    D As Integer
End Type

Dim STACK() As PILE
Dim StackPtr As Integer

Sub DeepInsertS(B() As Byte, P() As Long, L As Long, N As Long, D As Integer)
    Dim LP As Long
    Dim RP As Long
    Dim TMP As Long
    Dim I As Long
    Dim J As Long
    
    For RP = L + 1 To L + N - 1
        TMP = P(RP)
        For LP = RP To L + 1 Step -1
            I = TMP + D
            J = P(LP - 1) + D
            Do While B(I) = B(J)
                If B(I) = 0 Or B(J) = 0 Then Exit Do
                I = I + 1
                J = J + 1
            Loop
            If CInt(B(I)) - CInt(B(J)) < 0 Then P(LP) = P(LP - 1) Else Exit For
        Next LP
        P(LP) = TMP
    Next RP
End Sub

David B. Ring
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap