Browse DevX
Sign up for e-mail newsletters from DevX


Sorting Custom Collections : Page 5

Did you know that the .NET Framework has no built-in functionality to sort custom type collections? This article provides you with an easy way to use T-SQL-like sort expressions to sort your custom type collections and explains how this great utility works under the hood.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

SortableCollectionBase in .NET 2.0
Before moving on to testing, I'd like to make a few observations about how this will work in .NET 2.0, currently known as "Whidbey." As far as I can tell, Microsoft is not building this kind of sorting functionality into the Framework. This is probably because of the potential memory bloat and performance hit of the dynamic assembly creation versus the perceived benefit. Then again, maybe it just wasn't on their To Do list.

In any case, what this means is that you can port the SortableCollectionBase to .NET 2.0 and pick up a few nice benefits. The first and most obvious benefit is that you can declare a type parameter (see http://www.15seconds.com/Issue/040525.htm for my article on generics if you're not familiar with this terminology). Doing this allows you to reuse the SortableCollectionBase for any type without having to declare collections for each type. In fact, the SortableCollection<T> example included in the sample code for this article inherits from System.Collections.Generic.List<T>, which provides you with all the functionality of that generic type and adds the sorting functionality available in SortableCollectionBase.

In addition, you could refactor the GetNewView method to return a SortableCollection<T>, eliminating the need for one of the overloads and the System.Type method parameter as well as the Reflection needed to instantiate the return type.

The only really tricky part I found in porting to .NET 2.0 was that the System.Collections.Generic.List<T>.Sort method requires a System.Collections.Generic.IComparer<T> as a parameter instead of the old standard IComparer, so I had to figure out how to pass a type argument "T" to the .NET 2.0 type parameter "T" via Reflection. On Google I found out about the BindGenericParameters method on the System.Type type. The following lines demonstrate what you need to do to pass type arguments via Reflection.

System.Type constructedType, genericType; genericType = compiled.CompiledAssembly.GetType( dynamicTypeName + "`1"); constructedType = genericType.BindGenericParameters( new Type[1] {typeof(T)}); comparer = constructedType.GetConstructor( new Type[0] {}).Invoke(null) as System.Collections.Generic.IComparer<T>;

As you can see, first you need to get an instance of the generic type. The backtick (`) plus integer indicates the arity of the generic type, that is, the number of type parameters that the generic type has. In this case, I only have one type parameter, so it is "`1" for this example. Once you get the generic type instance, you call BindGenericParameters on it, passing in a System.Type array of types to use (the order of the array should match the order of the type parameters from left to right). BindGenericParameters returns the constructed type that you can use to create an instance of the new comparer.

Using generics means that you do not have to cast the type in the IComparer<T> implementation, but you do have to apply a constraint on the generic comparer that specifies it must be the type of your custom object so that you can access the properties of that type in your comparer. Thus, in reality, you really don't have much of a generic comparer in terms of reusability, but at least you gain a bit of performance by not casting.

Testing SortableCollectionBase
You now know all you need to know (and more) to get started sorting your custom object collections easily and quickly. The sample code with this article also includes a test project that has test fixtures you can use with NUnit 2.1 or 2.2. The test uses the Northwind database as a data source for collection data. It declares an Order class with a few properties (for expediency) that will be used to sort upon, and it declares an OrderCollection that inherits from SortableCollectionBase.

The sample code includes one test method (see Listing 6), and this method serves a dual purpose. It first determines how much time it takes to sort the collection, and then it checks the validity of the sort. To test the validation, the test method compares the sort result of the SortableCollectionBase.Sort method (testGroup) with the sort result of the same sort expression passed to SQL Server (controlGroup).

The first line of the test Sort method calls ReloadValues, which simply opens up the SortableCollectionBase.xml file and retrieves relevant variable settings, such as the select statement, sort expression, connection string, and whether or not to test only the sorting time. The last value can save time when testing very large collections (so you don't have to create it twice and compare).

The Sort method creates an OrderCollection, calls the SortableCollectionBase.Sort method on it (timing how long this takes), and then, if validation is requested in the configuration file, it loads up the same items from SQL Server using the same sort expression and compares each item in the test group with those of the control group.

The key for the validation comparisons is to only compare the values that were sorted upon because they are the only ones guaranteed to be equal. Since the QuickSort algorithm is not stable, there is no guarantee that items that are equal (in terms of values compared in the sort) will remain in the order that they were in prior to the sort. So, for instance, if you had a collection of employees sorted by last name and you wanted to further sort them by department, there is no guarantee that after sorting by department they will still be sorted by last name. What this means is that you need to specify every sort criterion needed when calling the Sort method.

If you look closely at the test helper methods, you'll notice that I use a lot of Reflection. This is simply to add flexibility to the test code, so you can, for instance, change the sort expression in the configuration file without having to update the test code. If you find this confusing, just stick to actually running the tests by modifying the configuration file values, trying different sort expressions and different numbers of items selected.

You may also be asking yourself, if you're familiar with the Northwind database, how I can use that as a test source for large collections, since the largest item count is in the Orders database, which has around 850 records. The answer to this is to run the two included SQL scripts, first the OrdersAddEntropy.sql, which adds an "Entropy" column to the Orders table, and then the BloatOrder.sql script that increases the record count in the Orders table to over 100,000. The SQL scripts recursively insert all of the rows in the table seven times. The Entropy column is used to add some semi-random data into the rows, so you can use it as the last value in the sort expression to provide some uniqueness (since other than that, we're just looking at a bunch of copies of the original 850 rows). After running these, you should have a sufficiently large data source to build out some huge collections.

In my tests, I found that sorting a collection of 10,000 items took about 250 milliseconds on my 2.2 GHz Pentium 4M notebook. Sorting 100,000 items took only around 1.4-1.5 seconds, which is really quite fast all things considered; however, I wouldn't recommend keeping that many custom objects in memory as a general rule. I found that the retrieval and creation of the 100,000 items took about a minute, so you can see where the real bottleneck is in that situation. After working out a few quirks in the test logic, I found all tests validated, so the sorting logic is dependable as well as fast.

All in all, I was quite pleased with the final product in terms of dependability and speed. However, let me suggest a few caveats to consider. The main one is the already-mentioned potential memory bloat caused by creating a bunch of dynamic assemblies. Whether or not this is a problem will depend on the target hardware and the application itself, but generally I wouldn't expect it to be an issue except in large applications (applications with many domain objects and collections) with high use. In that case, I recommend profiling and testing to ensure using this approach will meet your performance goals.

The other caveat is more a question of architecture in that the usefulness of this facility will largely depend on whether or not your architecture can use sortable custom object collections. All the same, it wouldn't hurt to add this to your library to have if you ever do find a use for it - at least you won't throw your hands up in despair when you find you want to sort your collections. I know I'm glad to have this in my toolkit now!

You can download the source code here.

J. Ambrose Little currently works as the Codemunicator for Infragistics, the leader in presentation layer components. He�s contributed to two books, Professional ADO.NET 2 and ASP.NET 2 MVP Hacks, both by Wrox, and he speaks at local events and conferences when he can. You can reach him via e-mail or his blog.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date