RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


LINQ Into Microsoft's New Query Capabilities

Query features have long been a cornerstone of database applications, but with LINQ, Microsoft introduces query language features right inside of C# and VB.NET.

t PDC 2005, Microsoft introduced brand new technology known as LINQ, which stands for "Language Integrated Query."

The feature set hiding behind this acronym is truly mind-boggling and worthy of a lot of attention. In short, LINQ introduces a query language similar to SQL Server's T-SQL, in C# and VB.NET. Imagine that you could issue something like a "select * from customers" statement within C# or VB.NET. This sounds somewhat intriguing, but it doesn't begin to communicate the power of LINQ. LINQ represents the ability to apply query-style syntax to objects rather than just data sources. This difference, as well as some of the implementation specifics, makes LINQ significantly more powerful than other query languages.

Needless to say, processing of any sort of information is of utmost importance in software. Much of this "information" is stored in databases in the form of rows and tables. To process that data, developers use relatively sophisticated query and data manipulation mechanisms. Yet not all data is stored in databases. I would even argue that today, most data is not stored in databases. Much of it is also stored in places like XML files, HTML pages, e-mails, and the like. The ability to query this sort of information is currently much less developed than for databases.

Furthermore, data is not useful just stored in databases or XML files. Instead, applications bring data into memory to process, and once data leaves its original place of storage, the fundamental need to handle and manipulate that data does not change, yet in current versions of .NET (as well as many—but not all—other programming languages), the ability to handle data at that point is relatively poor. It is easy to retrieve a list of customers joined with their invoice information from SQL Server, but it is not easy to use customer information in-memory in .NET and join it with the customer's e-mails. From a .NET point of view, different types of information are usually available in object form. Unfortunately, Microsoft has not provided a good way to join lists of objects or perform any other sort of query operation.

LINQ solves this problem. In fact, LINQ solves this problem and many others as well. This makes LINQ one of the features at the very top of my "technologies I want today" list. Unfortunately, Microsoft has made LINQ available only as a CTP (Community Technology Preview), which means that it isn't even in beta yet. Ultimately, the expectation is that LINQ will ship with Visual Studio "Orcas." You can install the LINQ CTP bits on top of Visual Studio 2005, which provides a number of additional assemblies as well as new versions of the C# and VB.NET compilers. Using this constellation, you can use and compile the new LINQ syntax Visual Studio.

Author's Note: IntelliSense and syntax coloring are not always appropriate for the new features because the Visual Studio editor is not yet aware of the new LINQ features.

A First Example
In SQL Server, queries are pretty simple. For instance, you can easily query all records from a Customer table in the following fashion:

   SELECT * FROM Customer

The return value is a "result set" (which really behaves and appears very much like a table) that contains all fields from the Customer table. The overall situation is relatively simple and predictable for the compiler (or interpreter) that has to process this statement. "Customer" is always a table (or equivalent construct, such as a view, which is really also a table in terms of behavior and functionality). Inside the Customer table you'll find rows of data, and each row is composed of a number of fields, all of which you expect to be part of the result set.

Using LINQ you can perform a similar operation right in C# or VB.NET. The main difference is that C# doesn't deal with tables but objects, and in particular, lists of objects (be it collections or arrays or any similar construct). To start out with a simple example, I will use one of the simplest data sources LINQ can use: an array of strings. Here it is in C#:

   string[] names = 
      {"Markus", "Ellen", "Franz", "Erna" };

Or, the VB.NET equivalent:

   Dim names As String() = 
      {"Markus", "Ellen", "Franz", "Erna" }

Using the new LINQ syntax your code could query from this "data source" in a manner similar to querying from a table in T-SQL. Here is a simple VB.NET query that retrieves all "records" from that array:

   Select name From name in names

This is somewhat similar to the T-SQL statement above. The main difference is the "from x in y" syntax that you might find a little confusing at first. Let's take a look at what really happens here. Fundamentally, you'll retrieve data from a list of objects called "names." This list is an array object that's the equivalent of the Customer table in my previous T-SQL statement. The main difference is that while it is completely clear that a Customer table contains rows, it is not at all clear what objects are in collections of other objects. Therefore, you also need to specify what you expect inside that collection. In my case, I've stated that I want to refer to each object inside the names array as "name." You might compare this to a for-each loop:

   ForEach name As String in names
      ' name.xxx

You must remember to name each element you expect inside the collection to subsequently use that named reference to specify the expected result (among other things). In the above example, I stated that I want to select the entire string (called "name") as my result set, since that is really all there is to select in this simple example.

Of course, since VB.NET is an object-oriented environment, the result set must also be a list of objects. I therefore need to assign the SELECT statement (or the result of the SELECT statement) to a variable reference:

   Dim result As IEnumerable(Of String) = _
      Select name From name in names

SELECT statements return a list of type IEnumerable<T>. In other words, the result set is a list typed as the generic version of IEnumerable. In my case, the elements in that generic IEnumerable list are strings, since each "name" in the SELECT statement is a string.

Of course, C# also supports LINQ natively. Consider this C# version.

   IEnumerable<string> result =
      from name in names select name;

The main difference here is that C# always puts the "select" part as the last part of the command. This looks a bit odd at first, but I could argue that it makes more sense. For instance, a few paragraphs above where I described what the VB.NET example does, I had to start out my explanation with the "from" part. Also, it is more convenient for IntelliSense. Once you've typed in the "from name in names" part, IntelliSense can display a sensible list of possible "selectable" members, while the VB version can not do so by the time you're likely to enter the select part. Ultimately, this comes down to personal preference since the functionality is exactly identical. (This statement seems to be true for the majority of features in C# and VB.NET.)

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date