RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Beyond Tables: Dealing with the Convergence of Relational and XML Data

Developers are under increasing pressure to work with heterogeneous information. The advent of XML provided a data representation suitable for both regular (database tables) and irregular content, and was a significant step forward, but representation alone is not enough. We need a powerful standardized method for querying that data as well. XQuery fulfills that need.

ew serious applications are considered enterprise-worthy without a core database engine backed by an extensive, normalized, and optimized relational database architecture. Traditionally, such database applications rely on SQL queries and statements to retrieve and update data in the back-end store. But that's about to change. The W3C XQuery language is accelerating towards "Recommendation" status; SQL developers should take note. According to this March 2005 Developer Survey conducted by my employer, DataDirect, XQuery is quickly becoming both a required and a core component in both emerging enterprise application architectures such as service-oriented architectures (SOA), and in more established enterprise architectures such as J2EE. XQuery is the best approach for integrating XML and relational data and will quickly become as ubiquitous in the future as SQL is today.

Setting the Scene
The invention of XML and the Internet created opportunities for businesses to exchange information in a way previously possible only through narrowly defined data interchange formats such as EDI (Electronic Data Interchange) typically governed by organizations such as the Data Interchange Standards Association (DISA). XML is now considered the de facto standard for retrieving and exchanging data. However, the rapid growth of XML and increasing proliferation of hierarchical messages presents a fresh set of challenges to established enterprise applications and developers who have historically built their business process around relational databases.

Business-critical data is typically stored in relational database management systems (RDBMSs). By centralizing storage and distribution of data, relational databases consolidated data security, integrity, and control within a single system. Many database systems are well-established and reliable enough that their existence is entrenched; they're unlikely to disappear any time soon. In spite of this dominant position, the growth of XML is forcing modern business applications to function seamlessly with both relational and XML data.

Relational data and structured XML data have very different models. Before analyzing various data integration approaches, it is worth revisiting the organization of the data models.

The Relational Model
Relational data is organized according to a set of cardinalities and logically defined dependencies known as normalizations. A table is required to express a single defined set of data, with each table containing a set of records organized by rows, or tuples. Data in each of these tables is organized by columns, which may serve as keys. A key column uniquely identifies the data in the rows of a table.

You can establish simple relationships between tables by storing data within the same tuple, or more complex relationships by using separate relationships and common keys.

XML Model
XML data turns the relational data model almost completely on its head. Relationships between data are intrinsic as opposed to the more explicit relationship expressions used in the relational model. XML documents use parent/child relationships and element/attribute relationships. Hierarchical data relationships are more obvious than relational relationships; they are based on the relative position of each node within the document and easily discernable.

Specialized Optimizations
Relational tables and XML documents are both powerful ways to represent relationships between data, but each is optimized to provide a particular benefit. Relational tables, coupled with keyed columns, are optimized for efficient data retrieval with minimum fuss. XML documents are optimized to express the intrinsic relationships of data that together make up an XML document.

Bringing Together Tables and Documents
Why do enterprises need both XML and relational data technologies? The answer is that enterprise application developers must leverage existing investment in applications based on the relational model while quickly adapting them to the heterogeneous and message-driven nature of XML data.

XQuery: The Resolution
XQuery is the best approach for integrating XML and relational data. The W3C XQuery specification provides a native XML query language that integration platforms and components can use to solve this problem. XQuery levels the data integration playing field by providing a single interface that lets developers access multiple data sources under a unifying data model. Middleware products are set to deliver Java components that provide developers with extensive options for presenting and exchanging their relational data as XML and for processing relational and XML data together.

The gradual leveling of the data integration landscape will precipitate further by RDBMSs embedding XQuery support as a means to expose relational data as an XML data source, therefore implicitly increasing data portability and accessibility via XQuery itself. RDBMSs without integrated support for XQuery will continue to delegate the responsibility to the middle tier to ensure their equal participation in increased data integration.

Before XQuery, developers' design patterns strategies for integrating relational and XML data were limited to:

  • Shredding (decomposing XML into relational tables) XML data into individual table columns in relational database—this process flattens the built-in data hierarchy, and (potentially) loses the intrinsic internal data relationships. The original XML document itself is also lost, although in can, in some cases, be reproduced from the shredded data. If preserving the XML structure is unimportant, shredding is reasonable approach for combining XML and relational data.
  • Storing the XML data as unstructured data in a relational database—using the CLOB (Character Large Object) data type. CLOB columns can store an XML document in its entirety, thus preserving both the document and its internal relationships. However, treating the XML document as nothing more than a text file severely compromises the ease with which it can be queried and searched.
  • Storing the XML data as a structured XML document in a relational database
The last alternative—structured XML document storage—enables a close relationship between XML data and relational data within a traditional relational database. This method preserves the document structure and maintains the hierarchical relationships within the document, but relies on direct support for structured XML as part of the database architecture. To successfully execute a concrete relational and XML data integration strategy requires a consistent, standards-based approach; however, the most widely used relational databases currently have wildly varying levels of support for structured XML and relational data co-existence, thus making portable data integration difficult.

This lack of native database standardization and support for XML means that developers are likely to turn to middle-tier components to obtain a consistent integration end-point for relational and XML data.

XQuery and Middle-tier Components
The middle tier will likely emerge as the sweet spot for developers to establish an integration component end-point (iCE) as a means to integrate a set of distributed data sources, both relational and structured. This iCE component will typically exist in (but will not be restricted to) the middle tier between the application (client tier) and data source components, and encapsulates a W3C XQuery implementation and runtime. Such middle-tier XQuery implementations will help application developers address their relational and XML data integration challenges.

XQuery offers the best integration technology because it leverages the structure of XML to allow applications to express queries across all kinds of XML data regardless of the data's location. Weaving together distributed relational data sources with XML data also provides a solid foundation on which to migrate and build applications towards SOA deployments.

Readers may be wondering whether such integration truly requires creating yet another query language. To answer this question, it's worth considering some history.

Editor's Note: The author, Jonathan Bruce, is a Technology Evangelist for DataDirect Technologies, a vendor of XML processing components and database drivers. We have selected this article for publication because we believe it to have objective technical merit and valid insights.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date