devxlogo

Smarten Up Your DataSets with XML Schema

Smarten Up Your DataSets with XML Schema

ou might as well face it; your data model is going to change. You can whine about it. You can fight with the DBA and the requirements team. Or you can get ready for the change before it happens. This article focuses on the last option, giving you some techniques to build a more flexible data layer.

By now, you’re probably sold on the benefits of the typed DataSet. The accessor methods make client code friendlier to write and easier to debug. Best of all, Visual Studio creates the typed classes for you automatically. But the typed DataSet can lack flexibility. Datasets are often described as an “in-memory database”?and that may be a good description, because they seem to have all the shortcomings of a relational database. These shortcomings can include lack of code-reuse and difficulty interfacing with object-oriented systems.

Worst of all, DataSet lacks inheritance, so you often end up with the same fields or even the same tables defined more than once. As your DataSets grow more complex, two potential problems emerge:

  • All the tables are defined independently, no matter how many fields they might have in common. It’s a little like the bad-old-days before true object-oriented programming.
  • There’s no obvious method for sharing a table definition across multiple DataSets.

Overcoming these limitations is relatively easy?if you know a little XML Schema Definition language (XSD, also known as XML Schema.). In case you’re new to XML Schema, this article includes a quick introduction. Next, you’ll see how to tweak the XSD documents created by Visual Studio to create custom, reusable XSD types. Finally, even XML can’t solve all the world’s problems (no matter what their marketing departments say), so it’s worth taking a look ahead to the next release of Visual Studio to see what it has to offer for data-layer developers.

XSD for Short Attention Spans
XSD is an XML-based language used to define a set of rules that apply to other XML documents. ADO.NET uses a lot of XML under the hood, so it makes sense that Visual Studio uses XML Schema to define the typed DataSets that it generates.

The easiest way to discover a little more about XSDs is to explore. To begin, drag a table or a view from the Server Explorer into the graphical designer. Figure 1 shows a screenshot of the sample project’s visually-designed DataSet.

?
Figure 1. Auto-Generated DataSet-Designer View: The figure shows a ordinary dataset, created by dragging tables from the Server Explorer nto Visual Studio’s visual designer.

Even if you’ve never examined the code, you’ve probably seen the XML tab at the bottom of the DataSet designer. Click on the XML tab and take a look at what’s going on behind the scenes. You should see something similar to the code in Listing 1.

The W3C created the XSD specification as a language capable of defining a wide variety of XML documents, not just datasets, so not every tag is useful or interesting to you as a DataSet consumer. Some tags you’ll use only as placeholders; others you won’t use at all. Table 1 shows some of the most common XML Schema tags you’ll work with for modifying DataSets. For a more detailed explanation of XSD, check out this tutorial.

Table 1. XML Schema Basics

Element

Description

Element

Defines the “leaf level” of an XML document. Used mostly forcreating columns in ADO.NET schemas.

Schema

The root tag, which contains just about everything else in the document.

complexType

Used to define anything more complex than a single column, suchas DataSets and DataTables.

Sequence

Defines an ordered collection of elements, such as a series of columns.For ADO.NET developers, this element functions primarily as a placeholder.

Include

Adds an external schema to the current document. Similar to theusing keyword in C#.

Here’s the XSD schema for a simple table definition:

                                                                  

Customizing the XSD is easier than you might think. In fact, it’s a lot like programming in .NET. You start by defining types and giving them attributes. And, just like .NET, XSD allows you to extend types. Digging a little deeper into XSD gives you the power of code-reuse without giving up the ease of automatically-generated DataSets.

DataSet Inheritance
One data modeling technique that’s especially popular with larger data models is to name all primary keys the same. For example, suppose that every table in a database has a primary key column called ID. Another common field is a timestamp, often used for optimistic concurrency or merge replication.

As the data-layer developer assigned to the project, you need to build a strongly-typed dataset on top of that data model. Creating it in Visual Studio is easy?you just drag and drop tables from the Server Explorer onto the DataSet designer. But what happens if your DBA decides to add a new field to every table? Or rename the primary key? Or get rid of the timestamp field? A DBA can hack up an SQL script in no time to change all the tables at once. But it’s a little harder for a Visual Studio developer. The na?ve approach would be to manually edit every table. But that’s boring, repetitive work, the kind of work likely to introduce errors and inconsistencies into your application. Moreover, in a prototyping environment, the data model is likely to change?a lot. And manually editing every table is not the kind of work you want to do every day.

What you really need is a reusable type that will abstract the information common to every table. That way, if your DBA makes any kind of global change, such as renaming the primary key field, you’ll have to change it in only a single place.

To create this reusable type, add a new complexType tag directly underneath the schema tag of your XSD document. This complexType looks just like the rest of the table definitions that were created automatically, except that by default, Visual Studio creates all types as anonymous. That means it doesn’t name its types because they’re all defined in-line. Because you want to use the same type more than once, you can simply add a name attribute, like this:

                                                                                                                    

Think of the new element as a base class. The element functions as an abstraction representing those columns common to every table. When fields common to multiple tables change, you need only change the base class. When you switch back to the visual designer, you’ll notice the new type.

?
Figure 2. Back to the Visual Studio Designer: Notice that the designer now shows the new complexType, which abstracts those fields common to all tables.

It’s a little surprising how harmless it is to make such alterations to the schema. Microsoft has done a great job of making sure that you can modify the schema either in the Visual Designer or in the source code side-by-side without either interfering with the other. Unlike developing Windows Forms, there’s nothing dangerous about alternating between the designer and the underlying schema. The formatting may change a little, or your white space may get eaten up, but those are the only things that will change.

Now comes the big payoff. If your DBA removes the timestamp field, or renames the primary key, or adds a new field to every table in the database, you don’t have to change every table manually. Just make your changes in the tableBase type and all of its subtypes will inherit the changes. Even better, you can extend the concept of inheritance to create a hierarchy with an arbitrary number of levels. Listing 2 shows a sample schema that uses inheritance to create a hierarchy three levels deep. Any change in a type will automatically appear in all of its subtypes.

It’s important to notice that none of this changes the behavior of the code generated by Visual Studio. Not at all. As the designer of the dataset, you’re far better prepared for a changing data model. You can now define a type in one place and extend it throughout your XML schema, just like a class in C#. But the abstraction remains invisible to client developers. They don’t know how much work you saved because to them, it appears as if all the alterations were manual.

Of course, there’s a downside as well. You’ve introduced some reusable types into the schema definition, but that reuse doesn’t translate into an OO-style inheritance tree in the typed DataSet. The type tableBase remains an artifact of the schema, one that a client developer will see only through its subtypes. In other words, this technique gives you code re-use, but not polymorphism.

Managing Multiple DataSets
As your data layer grows, you may find yourself managing multiple typed DataSet classes that overlap slightly, sharing tables between them. Working with multiple datasets means extending the concept of reusable types across multiple XSD documents.

As an example, I’ll show you how to build a data-layer on top of Microsoft’s Pubs sample database. The sample project contains two datasets: one called EmployeesDS, which holds employee data and another called SalesDS, which holds sales data. Both datasets share the Publishers table. By default each generated dataset would contain a schema defining the Publishers table, meaning that if the table changes, you’ll have to change the schema in two different places. Ideally, you need a way to define each type once; no matter how many times your application uses or extends that type. That way, when the shared Publishers table changes, you’ll only have to modify your code in one place.

To abstract the shared publishers type, first create a new schema document called DsBase.xsd. The schema doesn’t have to be a DataSet because you’re not using it to create any classes. Notice that the schema element needs relatively few attributes.

   

It’s important to notice that I’ve removed the targetNamespace attribute from the schema. This technique is known as “chameleon” namespace design, and allows the DsBase document to take on the namespace of any document that uses it. Think of it as an abstract class. For another example of a chameleon namespace, take a look at these guidelines.

Here’s the completed base schema.

                                                                                                            

Even though you’re not creating a DataSet, note that the XML looks the same as in a DataSet schema.

To use the publishers type you just created, first import the base schema into your DataSets using the include tag.

      

Now, you can use everything from the included schema exactly as if it were defined locally. In the XSD file, replace the locally-defined Publishers table with an element of type publishers.

      

Listing 3 shows a completed SalesDS.xsd which uses the imported publishers type.

Figure 3 shows a screenshot of the schema in the Visual Studio designer. Notice that you can’t edit the publishers type because it’s not defined in this schema.

?
Figure 3. Back to the Visual Studio Designer: Notice that you can’t edit the publishers type because it’s not defined in this schema.

What’s Next?
As I mentioned earlier, this technique can help, but there’s still something missing: polymorphism. Using XSD gives you a kind of inheritance, but wouldn’t it be nice to write generic code?for example, code that could handle either Student or Teacher types equally well? It’s certainly possible, but doing so requires using an Object/Relational (O/R) Mapping tool. Such tools provide developers with a less strict coupling between classes and the relational model?an ideal situation, because data modelers and software architects don’t always design under the same constraints. You could write an O/R tool, or buy one. But they’re not trivial to write (in fact, they can be fiendishly complex), and they’re not cheap to buy from a third-party vendor.

The good news is that Microsoft is working on its own O/R Mapper, known as ObjectSpaces, that’s planned for inclusion with Visual Studio 2005 (codenamed Whidbey). This tool promises to include a set of classes similar to ADO.NET except with a looser coupling to the database. Until then, we’ll just have to get by using XSD.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist