devxlogo

MSD RadixSort – An algorithm that can achieve O(N) behavior

MSD RadixSort – An algorithm that can achieve O(N) behavior

' MSD RadixSort.  No sort based on comparisons can be faster than O(N log N).  ' RadixSort makes no comparisons and can therefore achieve O(N) behavior.  To ' do this, it examines keys one byte at a time, counting the number of keys ' that have each possible byte value.  The counts are then used to build an ' offset table specifying the sorted order of the keys.  It's easiest to work ' backwards from the least significant digit of the keys (LSD Radix),'  since order based on more significant digits is not disturbed by less ' significant digits.  Unfortunately, the LSD approach requires padding short ' keys if key length is variable, and guarantees that all digits will be ' examined even if the first 3-4 digits contain all the information needed to ' achieve sorted order.  Most significant digit (MSD) RadixSort takes a lot ' more bookkeeping -- the list must repeatedly be split into sublists for each ' value of the last digit processed -- but the pay-off is that only as many ' digits will be examined as are needed.  As in QuickSort and MergeSort,'  it's worthwhile to hand off the sublists to InsertionSort when they get ' short enough.'' MSD RadixSort is stable and runs in linear O(N) time.  It is fairly memory ' intensive, needing space for some extra counting arrays and stack space for ' recursive calls, but still uses less memory than MergeSort.  This version is ' set up for strings and would take some work to adapt to other data types.  ' Integers and longs could be handled by converting them to strings (and ' limiting the arrays CNT and IND to the ten numerical digit values plus the ' minus sign); this is probably worthwhile only if you have hundreds of ' thousands of integers or longs to sort.  Doubles are more of a challenge,'  requiring conversion to an array of bytes by type casting.  I have not ' pursued this, since VBA on the Mac does not include the CopyMemory function ' that enables type casting on Windows systems.  '' NOTE:  This version is set up to count byte values from 1 to 127.  If your ' keys use (for instance) only lowercase or only uppercase alphabetic ' characters or only numerical digits, you can trim the CNT and IND arrays  and ' your sorts will run correspondingly faster.'' Reference:  P. M. McIlroy, K. Bostic and M. D. McIlroy,'  "Engineering  Radix Sort", Computing Systems 6(1):5-27 (1993).'' Speed:  MSDRadixSortS sorts 500,000 random strings in 28.3 sec; sorts 100186 ' library call numbers in 21.3 sec; sorts 25479 dictionary words in 1.8 sec ' (random order), 1.5 sec (presorted) or 1.9 sec (reverse sorted).  Timed in ' Excel 2001 on an 800 mhz PowerBook.'' Bottom line:  complex and best suited to strings, but there's nothing faster ' for really long lists.' Usage:  Dim S1(L To R) As StringsDim B1(1 To nChars) As ByteDim P1(L To R) As LongFor I = L To R    S1(I) = GetRandomString()Next IStrsToBytes S1, B1, P1    'a routine that stores the strings in 0 terminated                           ' byte arrays with                    'P1() holding pointers to the start of each byte series.MSDRadixSortS L, R, B1, P1' CODE:Sub MSDRadixSortS(N As Long)    Dim CH() As Integer        ReDim CH(1 To N)      'See below for type PILE (stack records)    ReDim STACK(1 To 1000)    StackPtr = 1    With STACK(StackPtr)        .L = 1        .R = N        .D = 0    End With    StackPtr = StackPtr + 1    RadixS CHEnd SubSub RadixS(CH() As Integer)    Dim L As Long    Dim R As Long    Dim D As Integer    Dim I As Long    Dim J As Long    Dim TMP As Long    Dim C As Integer    Dim NextC As Integer    Dim CNT(-1 To 127) As Long    Dim IND(-1 To 127) As Long    Dim NuCnt As Long    Dim BigCnt As Long    Dim OldSp As Long    Dim BigSp As Long        While StackPtr > 1    'Pop a PILE off the stack.        StackPtr = StackPtr - 1    'Get left and right limits of sublist and depth of byte to examine        With STACK(StackPtr)            L = .L            R = .R            D = .D        End With    'Sublists of <= 24 keys will be finished by InsertionSort        If R - L > 24 Then        'Clear the count array.            For I = -1 To 127                CNT(I) = 0            Next I        'Get the byte at depth D in each key and count the number of each byte         ' value.            For I = L To R                C = CInt(B(P(I) + D))                CH(I) = C                CNT(C) = CNT(C) + 1            Next I        'We will add the counts to create sorted addresses in array IND().            IND(-1) = L        'At the same time we'll push the sublists for each byte value onto the         ' stack.            OldSp = StackPtr        'We'll track the biggest sublist and make sure it comes off the stack         ' last;        'this ensures the stack will not get more than logarithmically deep.            BigCnt = 0        'Now we build the addresses out of the counts.            For I = 0 To 127                J = I - 1                NuCnt = CNT(J)                IND(I) = IND(J) + NuCnt         'If there is a sublist of keys starting with the current byte value,         '  stack it.                If NuCnt > 1 Then                    With STACK(StackPtr)                        .L = IND(J)                        .R = IND(I) - 1                        .D = D + 1                    End With                    StackPtr = StackPtr + 1             'Keep track of the largest count / sublist.                    If NuCnt > BigCnt Then                        BigCnt = NuCnt                        BigSp = StackPtr                    End If                End If            Next I        'Swap the biggest sublist down into the stack so it comes off last.            TMP = BigSp            BigSp = OldSp            OldSp = TMP        'Now use the counts to move the pointers to their sorted positions         ' based on the        'bytes examined so far; doing this in place this gets a bit ugly.            For I = L To R         'We will use the byte at I to find what address P(I) should be mapped          ' to.                C = CH(I)         'We use -1 to flag pointers already moved.                CH(I) = -1         'If C = -1 we skip the loop and increment I until we find a pointer          ' not already moved.                Do While C >= 0             'We go to IND(C) to get the destination address for P(I).                    J = IND(C)             'We swap the current pointer for the one at that address.                    TMP = P(I)                    P(I) = P(J)                    P(J) = TMP             'Now we determine where to map the pointer we just displaced.             'We get the byte value from its key.                    NextC = CH(J)             'We flag it as moved.                    CH(J) = -1             'Once we've used each address we increment it unless we've hit R.                    If J < R Then IND(C) = J + 1 Else Exit Do             'We set C to the byte of the key of the displaced pointer; ready              ' to loop!                    C = NextC                Loop         'If a series of displacements circles back to a pointer already moved,         '  we          'increment I until we find a pointer not yet moved or until we end at          ' R.            Next I        Else            DeepInsertS B, P, L, 1 + R - L, D        End If    WendEnd SubType PILE    L As Long    R As Long    D As IntegerEnd TypeDim STACK() As PILEDim StackPtr As IntegerSub DeepInsertS(B() As Byte, P() As Long, L As Long, N As Long, D As Integer)    Dim LP As Long    Dim RP As Long    Dim TMP As Long    Dim I As Long    Dim J As Long        For RP = L + 1 To L + N - 1        TMP = P(RP)        For LP = RP To L + 1 Step -1            I = TMP + D            J = P(LP - 1) + D            Do While B(I) = B(J)                If B(I) = 0 Or B(J) = 0 Then Exit Do                I = I + 1                J = J + 1            Loop            If CInt(B(I)) - CInt(B(J)) < 0 Then P(LP) = P(LP - 1) Else Exit For        Next LP        P(LP) = TMP    Next RPEnd Sub

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist