n all the excitement about SQL Server 2005 (formerly codenamed Yukon), most people tend to focus on the .NET integration. But the native support for XML might turn out to have even more of an impact on the everyday working lives of DBAs. Native XML makes SQL Server an ideal knowledge management platform, a place for organizing memos, white papers, specifications, design documents, and corporate policies and procedures. In order to make sense of all this data we need a new query language: XQuery.
XQuery is the SQL of XML. And in many ways, the two languages are similar. Like SQL, XQuery does more than just queries; it’s closer to a general purpose data-processing language designed specifically for XML. While you can certainly use it to extract various meaningful bits of data from XML documents, XQuery can also be used for deleting, inserting, and updating data.
XQuery in SQL Server 2005
To show off some of the new XQuery functionality, I made up a fictional (and slightly dysfunctional) company called CookieCo. This bakery stores all of its recipes in a simplistic relational database (CookieCo.sql) of just three tables:
- Cookie. Each row represents a single cookie recipe.
- Ingredient. Each row represents one ingredient used in a recipe.
- Direction. Each row represents a single step in the directions of a recipe.
|Figure 1. Data Model: This Entity Relationship Diagram shows the three tables of a simple cookie-recipe database: cookie, ingredient, and direction.|
The idea of storing recipe information in a relational database may not be the best idea that CookieCo’s IT department ever came up with. But there are a few advantages over, say, a plain text file:
- All recipes live in a single, secure location.
- Recipe data is stored in a display-neutral fashion.
- Recipes can be queried without writing any specialized parsers.
The technology people at CookieCo like to be cutting-edge, so once they heard about the new XML functionality built into the next version of SQL Server, they decided to store all their recipes in the database in a field of type XML. Each recipe gets its own document, which is also stored in the Cookie table right next to the relational data that it’s replacing. So, the database contains two copies of every recipe: one copy in a relational format and one copy in XML?at least until the migration project is complete.
|What You Need|
|You should be familiar with XML and T-SQL. To really follow along, you’ll need SQL Server 2005. Subscribers can download the full beta from MSDN. Everyone else can get the express edition from Microsoft’s SQL Server Developers Center.|
‘C’ is for Cookie: Selects in T-SQL and XQuery
The rest of this article will introduce you to XQuery by way of the already familiar Transact-SQL language.
Let’s say that the bakers ordered too much salt and want to get rid of some. This simple T-SQL query finds all cookie recipes that have salt as one of the ingredients:
select Cookie.* from dbo.Cookiejoin dbo.Ingredienton Ingredient.Ingredient2Cookie = Cookie.idwhere Ingredient.name = 'Salt'
To get the same information from the corresponding XML documents, you need to use one of the new methods on fields of type XML. These new methods take XQuery expressions as string arguments, like this:
select *from dbo.Cookiewhere CookieInXML.exist('/cookie/ingredients/ingredient[name="Salt"]') =1
In this case, we’re using the exist method which (duh.) checks for the existence of some node or value. Take a closer look at the text of the actual XQuery:
This is XQuery at its simplest, using only a path expression. XPath is a slightly older XML technology used to identify nodes within an XML document. This expression checks for the existence of “Salt” within the ingredients of each cookie.
Table 1. XPath Basics: This table introduces two XPath building blocks.
|/cookie/ingredients/ingredient||A slash is used to separate nested data. In this case the English language translation is something like, “Show me all ‘ingredient’ elements nested directly within ‘ingredients’ elements nested directly within the ‘cookie’ element.|
|[name = “Salt”]||Square brackets, known as a “predicate expression,” are used to filter data. Here, it selects all ingredients with a name of “Salt”?roughly analogous to SQL’s WHERE clause.|
Now I’ll show you another translation from T-SQL to XQuery. Below is some T-SQL joining two tables in order to list all the ingredients in a particular recipe.
SELECT Ingredient.Name, Ingredient.Quantity, Ingredient.UnitFROM dbo.IngredientJOIN dbo.Cookie ON Ingredient.Ingredient2Cookie = Cookie.IDwhere Cookie.Name = 'Oatmeal Chocolate Chip'
Here’s the output:
Name Quantity Unit------------------------- ------------- ---------Vanilla Extract 1 TeaspoonButter 1 CupWhite Sugar 0.5 CupMilk 1 CupSalt 1 TeaspoonEggs 2 EachChocolate Chips 1 CupBrown Sugar 1 CupAll Purpose Flour 2 CupRolled Oats 3 CupBaking Soda 1 Teaspoon(11 row(s) affected)
The XQuery equivalent is also simple:
Select CookieInXML.query('/cookie//ingredient')from dbo.Cookiewhere CookieInXML.exist('/cookie[@name = "Oatmeal Chocolate Chip"]') =1
Here’s some of the output, which, not surprisingly, is in XML:
Vanilla Extract 1 ... Butter 1
This query isn’t much more complicated than the first, except this one uses two XPath expressions: The exist method filters rows, and the query method selects useful data from the XML document.
Table 2. More Basics: Three more building blocks include ?@?, ?//?, and ?query?.
|query||The query method displays data retrieved by the XQuery in the SQL result set.|
|@name||The “at” symbol is used within XPath to specify an attribute.|
|/cookie//ingredient||The double slash is used to select all ingredient elements within the cookie element regardless of depth.|
All Purpose FLWOR
XPath can only take you so far. Much of the SQL-style eloquence of XQuery comes by way of FLWOR expressions (pronounced “flower” or “flour” either of which lends itself to all kinds of bad jokes). The term is usually written in all caps because it’s a mnemonic to help you remember each clause in an expression: for, let, where, order by, and return.
Now imagine that the CookieCo bakers are running out of large measuring cups. In an effort to conserve, they want a list of all ingredients used in quantities greater than a single cup. Here’s the query written in T-SQL:
SELECT Name, Unit, QuantityFROM dbo.IngredientWHERE Ingredient.Quantity>1 AND Unit = 'Cup'
Here’s an equivalent XQuery using a FLWOR expression:
SELECT CookieInXml.query(' for $ingredient in /cookie//ingredient/quantity[@unit="Cup"]/.. where $ingredient/quantity>1 return $ingredient')FROM dbo.Cookie
XQuery works with sets of data. The query above takes the results of the XPath expression, filters them through the where clause, and outputs the results with no decoration using the return clause.
Table 3. Introducing FLWOR: Going beyond path expressions, FLWOR adds power and flexibility to XQuery.
|for||Associates an iterator with the results of an expression. In a FLWOR expression, at least one for or let clause is required.|
|$ingredient||XQuery prefaces variables with a dollar sign (“$”). In the example, the variable $ingredient is used for iterating through the results of the expression in the for clause.|
|..||Selects the parent element of the current node; used in the example to select the entire
|where||Uses a Boolean expression to filter the results; similar to SQL’s WHERE clause .|
|return||Constructs the output of the FLWOR expression?roughly analogous to SQL’s select list.|
The FLWOR mnemonic also includes two optional clauses not used above:
- let: Associates a single variable (rather than an iterator) with the results of an expression (not yet available in SQL Server 2005 as of Beta 3).
- order by: Orders results very much like the ORDER BY clause in T-SQL.
Of course, this is just the smallest taste of the power of FLWOR expressions. But for you DBAs, the expressions should be fairly intuitive and well worth the time it takes to master them.
Next I want to talk about data manipulation. Imagine that CookieCo’s insurance company calls (maybe after a couple of mishaps in the bakery). The agent insists on adding a safety note in the directions portion of every cookie recipe.
Inserting the new step in the relational database is easy enough, although it requires two statements because the instructions are numbered. First, you need to make room in the sequence. Then you can insert the new step into the first position.
UPDATE dbo.DirectionSET Sequence = Sequence + 1INSERT dbo.Direction (Sequence,Text,Direction2Cookie)SELECT1,'Take a moment to ensure the safety of your work area.',IDFROM dbo.Cookie
In XML, position is meaningful, so the equivalent XQuery has only one step:
UPDATE dbo.CookieSET CookieInXML.modify ('insert
Take a moment to ensure the safety of your work area.as first into (/cookie/directions) ')
Table 4. A first look at XML DML: XML data manipulation language is similar to its relational equivalent.
|insert||XPath equivalent of SQL’s INSERT statement.|
|||Predicate used for selecting the Nth node. Since tables contain rows with no inherent order, SQL has no built-in equivalent.|
All of which is pretty straightforward. But take another look at the last XPath expression:
Why is the whole thing encased in parentheses? The reason is that the target of the insert clause must be a single node. Without the parenthesis, the bracketed numeral would specify the first
Now imagine that the bakers go on strike. They feel insulted by the safety information, and refuse to pick up their aprons until it’s removed. After days of tense negotiations, the CookieCo management decides to delete the safety notice rather than undermine its relationship with the union.
Below is the T-SQL to remove the first step from every recipe. Again, this SQL script requires two statements: one to do the deletion and one to reorder the sequence.
DELETE dbo.DirectionWHERE Sequence = 1UPDATE dbo.DirectionSET Sequence = Sequence - 1
And again, XQuery equivalent only needs a single statement:
UPDATE dbo.CookieSET CookieInXML.modify ('delete /cookie/directions/step')
Simple enough, right? An XPath expression specifies the first step of each set of directions; the delete keyword deletes the results of the expression.
Finally, we’ll do an update. This time, a new heath craze, The Vanilla Diet, sends vanilla costs through the roof. To get a handle on the crisis, management decrees that all recipes should cut back immediately.
Here’s the T-SQL:
UPDATE dbo.IngredientSET Quantity = .05WHERE Name = 'Vanilla Extract'
And here’s the XQuery:
UPDATE dbo.CookieSET CookieInXml.modify ('replace value of (/cookie/ingredients/ingredient[name = "Vanilla Extract"]/quantity/text()) with .05 ' )
Table 5. More XML DML: XQuery’s update syntax works a lot like SQL’s.
|replace value of||Analogous to the SET keyword in SQL’s UPDATE statement.|
|text()||Used because the RecipeInXml field is untyped XML. The function ensures that the target will be treated as text rather than complex content.|
If the job of a language is to make easy things easy, XQuery is a great success. In SQL Server 2005, fields of type XML have methods. There are five methods total; we’ve worked with three: exist, modify, and query. Each method takes an XQuery in a string as an argument. The simplest XQuery is just an XPath expression (which can be quite powerful despite the simplicity) and for more complicated queries, you can use FLWOR expressions, which work similar to SQL queries.
XQuery is every bit as ambitious as SQL, so it’s impossible to do it justice in fewer than 2,000 words. But the fact that many of the XQuery design team are database professionals should give the rest of us SQL-nerds a head start.