devxlogo

MSD RadixSort – An algorithm that can achieve O(N) behavior

MSD RadixSort – An algorithm that can achieve O(N) behavior

' MSD RadixSort.  No sort based on comparisons can be faster than O(N log N).  ' RadixSort makes no comparisons and can therefore achieve O(N) behavior.  To ' do this, it examines keys one byte at a time, counting the number of keys ' that have each possible byte value.  The counts are then used to build an ' offset table specifying the sorted order of the keys.  It's easiest to work ' backwards from the least significant digit of the keys (LSD Radix),'  since order based on more significant digits is not disturbed by less ' significant digits.  Unfortunately, the LSD approach requires padding short ' keys if key length is variable, and guarantees that all digits will be ' examined even if the first 3-4 digits contain all the information needed to ' achieve sorted order.  Most significant digit (MSD) RadixSort takes a lot ' more bookkeeping -- the list must repeatedly be split into sublists for each ' value of the last digit processed -- but the pay-off is that only as many ' digits will be examined as are needed.  As in QuickSort and MergeSort,'  it's worthwhile to hand off the sublists to InsertionSort when they get ' short enough.'' MSD RadixSort is stable and runs in linear O(N) time.  It is fairly memory ' intensive, needing space for some extra counting arrays and stack space for ' recursive calls, but still uses less memory than MergeSort.  This version is ' set up for strings and would take some work to adapt to other data types.  ' Integers and longs could be handled by converting them to strings (and ' limiting the arrays CNT and IND to the ten numerical digit values plus the ' minus sign); this is probably worthwhile only if you have hundreds of ' thousands of integers or longs to sort.  Doubles are more of a challenge,'  requiring conversion to an array of bytes by type casting.  I have not ' pursued this, since VBA on the Mac does not include the CopyMemory function ' that enables type casting on Windows systems.  '' NOTE:  This version is set up to count byte values from 1 to 127.  If your ' keys use (for instance) only lowercase or only uppercase alphabetic ' characters or only numerical digits, you can trim the CNT and IND arrays  and ' your sorts will run correspondingly faster.'' Reference:  P. M. McIlroy, K. Bostic and M. D. McIlroy,'  "Engineering  Radix Sort", Computing Systems 6(1):5-27 (1993).'' Speed:  MSDRadixSortS sorts 500,000 random strings in 28.3 sec; sorts 100186 ' library call numbers in 21.3 sec; sorts 25479 dictionary words in 1.8 sec ' (random order), 1.5 sec (presorted) or 1.9 sec (reverse sorted).  Timed in ' Excel 2001 on an 800 mhz PowerBook.'' Bottom line:  complex and best suited to strings, but there's nothing faster ' for really long lists.' Usage:  Dim S1(L To R) As StringsDim B1(1 To nChars) As ByteDim P1(L To R) As LongFor I = L To R    S1(I) = GetRandomString()Next IStrsToBytes S1, B1, P1    'a routine that stores the strings in 0 terminated                           ' byte arrays with                    'P1() holding pointers to the start of each byte series.MSDRadixSortS L, R, B1, P1' CODE:Sub MSDRadixSortS(N As Long)    Dim CH() As Integer        ReDim CH(1 To N)      'See below for type PILE (stack records)    ReDim STACK(1 To 1000)    StackPtr = 1    With STACK(StackPtr)        .L = 1        .R = N        .D = 0    End With    StackPtr = StackPtr + 1    RadixS CHEnd SubSub RadixS(CH() As Integer)    Dim L As Long    Dim R As Long    Dim D As Integer    Dim I As Long    Dim J As Long    Dim TMP As Long    Dim C As Integer    Dim NextC As Integer    Dim CNT(-1 To 127) As Long    Dim IND(-1 To 127) As Long    Dim NuCnt As Long    Dim BigCnt As Long    Dim OldSp As Long    Dim BigSp As Long        While StackPtr > 1    'Pop a PILE off the stack.        StackPtr = StackPtr - 1    'Get left and right limits of sublist and depth of byte to examine        With STACK(StackPtr)            L = .L            R = .R            D = .D        End With    'Sublists of         If R - L > 24 Then        'Clear the count array.            For I = -1 To 127                CNT(I) = 0            Next I        'Get the byte at depth D in each key and count the number of each byte         ' value.            For I = L To R                C = CInt(B(P(I) + D))                CH(I) = C                CNT(C) = CNT(C) + 1            Next I        'We will add the counts to create sorted addresses in array IND().            IND(-1) = L        'At the same time we'll push the sublists for each byte value onto the         ' stack.            OldSp = StackPtr        'We'll track the biggest sublist and make sure it comes off the stack         ' last;        'this ensures the stack will not get more than logarithmically deep.            BigCnt = 0        'Now we build the addresses out of the counts.            For I = 0 To 127                J = I - 1                NuCnt = CNT(J)                IND(I) = IND(J) + NuCnt         'If there is a sublist of keys starting with the current byte value,         '  stack it.                If NuCnt > 1 Then                    With STACK(StackPtr)                        .L = IND(J)                        .R = IND(I) - 1                        .D = D + 1                    End With                    StackPtr = StackPtr + 1             'Keep track of the largest count / sublist.                    If NuCnt > BigCnt Then                        BigCnt = NuCnt                        BigSp = StackPtr                    End If                End If            Next I        'Swap the biggest sublist down into the stack so it comes off last.            TMP = BigSp            BigSp = OldSp            OldSp = TMP        'Now use the counts to move the pointers to their sorted positions         ' based on the        'bytes examined so far; doing this in place this gets a bit ugly.            For I = L To R         'We will use the byte at I to find what address P(I) should be mapped          ' to.                C = CH(I)         'We use -1 to flag pointers already moved.                CH(I) = -1         'If C = -1 we skip the loop and increment I until we find a pointer          ' not already moved.                Do While C >= 0             'We go to IND(C) to get the destination address for P(I).                    J = IND(C)             'We swap the current pointer for the one at that address.                    TMP = P(I)                    P(I) = P(J)                    P(J) = TMP             'Now we determine where to map the pointer we just displaced.             'We get the byte value from its key.                    NextC = CH(J)             'We flag it as moved.                    CH(J) = -1             'Once we've used each address we increment it unless we've hit R.                    If J 'We set C to the byte of the key of the displaced pointer; ready              ' to loop!                    C = NextC                Loop         'If a series of displacements circles back to a pointer already moved,         '  we          'increment I until we find a pointer not yet moved or until we end at          ' R.            Next I        Else            DeepInsertS B, P, L, 1 + R - L, D        End If    WendEnd SubType PILE    L As Long    R As Long    D As IntegerEnd TypeDim STACK() As PILEDim StackPtr As IntegerSub DeepInsertS(B() As Byte, P() As Long, L As Long, N As Long, D As Integer)    Dim LP As Long    Dim RP As Long    Dim TMP As Long    Dim I As Long    Dim J As Long        For RP = L + 1 To L + N - 1        TMP = P(RP)        For LP = RP To L + 1 Step -1            I = TMP + D            J = P(LP - 1) + D            Do While B(I) = B(J)                If B(I) = 0 Or B(J) = 0 Then Exit Do                I = I + 1                J = J + 1            Loop            If CInt(B(I)) - CInt(B(J)) 

devx-admin

Share the Post: