Beyond Tables: Dealing with the Convergence of Relational and XML Data

ew serious applications are considered enterprise-worthy without a core database engine backed by an extensive, normalized, and optimized relational database architecture. Traditionally, such database applications rely on SQL queries and statements to retrieve and update data in the back-end store. But that’s about to change. The W3C XQuery language is accelerating towards “Recommendation” status; SQL developers should take note. According to this March 2005 Developer Survey conducted by my employer, DataDirect, XQuery is quickly becoming both a required and a core component in both emerging enterprise application architectures such as service-oriented architectures (SOA), and in more established enterprise architectures such as J2EE. XQuery is the best approach for integrating XML and relational data and will quickly become as ubiquitous in the future as SQL is today.

Setting the Scene
The invention of XML and the Internet created opportunities for businesses to exchange information in a way previously possible only through narrowly defined data interchange formats such as EDI (Electronic Data Interchange) typically governed by organizations such as the Data Interchange Standards Association (DISA). XML is now considered the de facto standard for retrieving and exchanging data. However, the rapid growth of XML and increasing proliferation of hierarchical messages presents a fresh set of challenges to established enterprise applications and developers who have historically built their business process around relational databases.

Business-critical data is typically stored in relational database management systems (RDBMSs). By centralizing storage and distribution of data, relational databases consolidated data security, integrity, and control within a single system. Many database systems are well-established and reliable enough that their existence is entrenched; they’re unlikely to disappear any time soon. In spite of this dominant position, the growth of XML is forcing modern business applications to function seamlessly with both relational and XML data.

Relational data and structured XML data have very different models. Before analyzing various data integration approaches, it is worth revisiting the organization of the data models.

The Relational Model
Relational data is organized according to a set of cardinalities and logically defined dependencies known as normalizations. A table is required to express a single defined set of data, with each table containing a set of records organized by rows, or tuples. Data in each of these tables is organized by columns, which may serve as keys. A key column uniquely identifies the data in the rows of a table.

You can establish simple relationships between tables by storing data within the same tuple, or more complex relationships by using separate relationships and common keys.

XML Model
XML data turns the relational data model almost completely on its head. Relationships between data are intrinsic as opposed to the more explicit relationship expressions used in the relational model. XML documents use parent/child relationships and element/attribute relationships. Hierarchical data relationships are more obvious than relational relationships; they are based on the relative position of each node within the document and easily discernable.

Specialized Optimizations
Relational tables and XML documents are both powerful ways to represent relationships between data, but each is optimized to provide a particular benefit. Relational tables, coupled with keyed columns, are optimized for efficient data retrieval with minimum fuss. XML documents are optimized to express the intrinsic relationships of data that together make up an XML document.

Bringing Together Tables and Documents
Why do enterprises need both XML and relational data technologies? The answer is that enterprise application developers must leverage existing investment in applications based on the relational model while quickly adapting them to the heterogeneous and message-driven nature of XML data.

XQuery: The Resolution
XQuery is the best approach for integrating XML and relational data. The W3C XQuery specification provides a native XML query language that integration platforms and components can use to solve this problem. XQuery levels the data integration playing field by providing a single interface that lets developers access multiple data sources under a unifying data model. Middleware products are set to deliver Java components that provide developers with extensive options for presenting and exchanging their relational data as XML and for processing relational and XML data together.

The gradual leveling of the data integration landscape will precipitate further by RDBMSs embedding XQuery support as a means to expose relational data as an XML data source, therefore implicitly increasing data portability and accessibility via XQuery itself. RDBMSs without integrated support for XQuery will continue to delegate the responsibility to the middle tier to ensure their equal participation in increased data integration.

Before XQuery, developers’ design patterns strategies for integrating relational and XML data were limited to:

  • Shredding (decomposing XML into relational tables) XML data into individual table columns in relational database?this process flattens the built-in data hierarchy, and (potentially) loses the intrinsic internal data relationships. The original XML document itself is also lost, although in can, in some cases, be reproduced from the shredded data. If preserving the XML structure is unimportant, shredding is reasonable approach for combining XML and relational data.
  • Storing the XML data as unstructured data in a relational database?using the CLOB (Character Large Object) data type. CLOB columns can store an XML document in its entirety, thus preserving both the document and its internal relationships. However, treating the XML document as nothing more than a text file severely compromises the ease with which it can be queried and searched.
  • Storing the XML data as a structured XML document in a relational database

The last alternative?structured XML document storage?enables a close relationship between XML data and relational data within a traditional relational database. This method preserves the document structure and maintains the hierarchical relationships within the document, but relies on direct support for structured XML as part of the database architecture. To successfully execute a concrete relational and XML data integration strategy requires a consistent, standards-based approach; however, the most widely used relational databases currently have wildly varying levels of support for structured XML and relational data co-existence, thus making portable data integration difficult.

This lack of native database standardization and support for XML means that developers are likely to turn to middle-tier components to obtain a consistent integration end-point for relational and XML data.

XQuery and Middle-tier Components
The middle tier will likely emerge as the sweet spot for developers to establish an integration component end-point (iCE) as a means to integrate a set of distributed data sources, both relational and structured. This iCE component will typically exist in (but will not be restricted to) the middle tier between the application (client tier) and data source components, and encapsulates a W3C XQuery implementation and runtime. Such middle-tier XQuery implementations will help application developers address their relational and XML data integration challenges.

XQuery offers the best integration technology because it leverages the structure of XML to allow applications to express queries across all kinds of XML data regardless of the data’s location. Weaving together distributed relational data sources with XML data also provides a solid foundation on which to migrate and build applications towards SOA deployments.

Readers may be wondering whether such integration truly requires creating yet another query language. To answer this question, it’s worth considering some history.

Editor’s Note: The author, Jonathan Bruce, is a Technology Evangelist for DataDirect Technologies, a vendor of XML processing components and database drivers. We have selected this article for publication because we believe it to have objective technical merit and valid insights.

Do We Really Need Another Query Language?
I think we do, and here’s why. Put simply, working with XML using SQL and available SQL extensions imposes significant developer productivity constraints, forcing developers to think using two vastly different data models. Combining XQuery’s ability to abstract any data source formatted as XML, and XQuery’s highly optimized language and manipulative capabilities frees developers from the pain of data integration and lets them concentrate on building data-rich applications.

As recently as the SQL-99 specification, SQL was extended to permit support for structured data types. Although it remains possible that SQL could be extended further to meet many of the XQuery requirements, thus allowing enterprises to maintain their significant investment in SQL code in their deployed applications, that’s unlikely to happen any time soon. Couple this fact with the following observations, and you begin to gain an understanding of why developers need XQuery:

  • Relational data can represent nested hierarchical XML data by using tables with structured types and foreign keys. To search an XML structure stored this way, it is necessary to know the specific key required to locate a specific data item. XML has a much less formal arrangement for data, one that does not require prior knowledge of where the data exists.
  • Relational data is regular and consistent, which allows the data to be described separately using metadata. XML data can be quite irregular but it is self-describing.
  • The result of querying relational data is presented in a table; however XQuery results are exclusively hierarchical.
  • Relational tables do not have any notion of ordering; XML documents regard ordering as a fundamental facet of the data description.
  • Traditional application development requires a significant investment in data integration time. Transposing developer cycles away from tackling any number of integration challenges using XQuery offers significant productivity boosts.
  • XQuery allows developers to work directly with XML, which exposes many developers to a new set of challenges.
  • XQuery was designed as separate language optimized to work with XML data rather than as an SQL extension.

XQuery Expressions
XQuery is not simply an XML flavor of SQL. Having discovered how XQuery came about, many readers are likely wondering what this new language can do. Here are a few highlights:

Fundamentally, XQuery is a declarative language that uses expressions as central components for assembling queries. Each query is a set of expressions that may be nested to assemble a desired result.

Developers who have worked with relational data before, most likely encountered JDBC or the ODBC API to channel SQL statements to your chosen RDBMS. In the following section, we introduce some fundamental concepts of XQuery and draw comparisons as to how they map to already familiar concepts in JDBC, ODBC or common languages such as Java or C.

  • Literal?a direct syntactic representation of an atomic value similar to a String object in the Java programming language or a source code representation of a value in C# such as an integer or string literal.
  • Function Call?a library method call in Java
  • Path Expression?a concept unique to XQuery, which is used to locate nodes on a given tree in an XML document. (A SQL developer might think in terms of a foreign key to express a specific table row.)
  • FLWOR Expressions?XQuery provides the ability to combine information from many sources, reshape the data, and present it as a result. FLWOR expressions (pronounced ‘flower’) stands for For, Let, Where, Order by, Return. These key XQuery expression clauses let you to specify a set of criteria to retrieve a desired set of information. In many respects, FLWOR expressions can be considered as equivalent to SQL SELECT…FROM…WHERE… statements, with the exception that FLWOR expressions are optimized to deal with XML data.
  • Conditional Expressions?Using a conditional expression permits expressions that use if…then…else… clauses familiar to developers in most languages. There are two ways to assemble a conditional expression in a typical JDBC application?either you can call a stored procedure through the CallableStatement interface to manipulate the data within the relational data store, or you factor out a number of result sets from SQL queries and apply the necessary conditions in Java. XQuery allows you to quickly and easily articulate conditional expressions.
  • Quantified Expression?In XQuery, there are two types of quantifiers to remember:
    1. Existential?An existential quantifier lets expression determine whether a sequence satisfies a particular result for each item in a result.
    2. Universal?A universal quantifier lets an expression determine whether every node in a sequence satisfies a particular result.

A JDBC application might typically contain a quantified expression in a stored procedure accessed through the CallableStatement interface. Typically the stored procedure is configured to return a Boolean value by applying a test between two or more criteria.

XQuery in a Nutshell
XQuery provides a flexible, well-rounded means to manipulate XML data. In the short term, relational data stores will continue to be the dominant data storage model given the widespread and considerable investment in established and highly optimized enterprise applications. But XML is here to stay, it is growing, and XQuery will let developers unlock its tremendous potential while leveraging the significant established base of relational data.

At the time of this article’s publication, the W3C XQuery specification is poised to move forward to full “Recommendation” status. Codification of this standard will trigger significant activity in the data integration space, in particular in the area of unifying relational and XML data. Future iterations of major RDBMSs will see XQuery become an integral part of tomorrow’s relational data stores. This presents the implicit advantage of surfacing relational data as XML and the provision of database-optimized XQuery engines for XML query processing. XQuery-enabled middle-tier components will quickly be able to leverage the database-level implementations while supplying the optimal integration end-point for an arbitrary set of data sources.

XML continues to find its way into more of today’s applications, with developers finding new innovative applications for XML an effective data integration technology. Fortunately, the XQuery language is impartial as to where the XML to be manipulated is stored?leaving the relational data base investments secure. The power of the XQuery language and the importance of middle-tier XQuery implementations will result in significant productivity gains for developers of evolving and deployed applications alike.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: