Home » Using XML Glue to Solve Big Integration Problems

Using XML Glue to Solve Big Integration Problems

he core asset of any company is its business data. In a typical company, business data is stored in many formats and across many systems and databases throughout the organization. Data related to one business object, such as a customer’s contact or ordering information may reside in multiple locations and formats. The dispersed nature of a typical company’s data forces applications to connect to many different systems to build a view of business information.

Recognizing this problem, business managers attempted to increase efficiency and use existing data more effectively through automation. This resulted in additional custom developed data stores and new data warehouses being installed over legacy ones.

The distributed nature of a business’s data makes a streamlined access approach?and a single customer view, for example?nearly impossible. Any new development project must overcome the initial obstacle of obtaining the necessary business data.

Traditional Approaches
To solve the data exchange problem, IT departments began to experiment with ways to exchange data across their company and with partner companies. The solutions they found fell into three basic categories.

Packaged Applications	Some companies opted to use third-party packaged applications. Packaged applications often replaced dozens of applications within the company and relegated software maintenance concerns to the vendor. The implementation times measured in years, and the packaged application often required businesses to modify their common practices to conform to the preferred methods defined by the third-party application.
Software Adapters	Other companies turned to vendor-supplied adapters that connected older database formats to available software interfaces. These adapters contained the logic to make older data sources appear to be relational data sources. These companies discovered that, between the cost of the adapters (which could be fairly high-especially for mainframe sources) and the significant effort required to install and configure the packages, this was an expensive solution.
Proprietary Data Exchange	Many companies created proprietary binary and text-based formats to facilitate data exchange. These companies soon faced the problem of having to create parsers and adapters for their various systems in addition to the proprietary data exchange formats. One such format, Electronic Data Interchange (EDI), was developed to exchange business data. EDI messages contain a string of elements, each of which represents a single item, such as cost or version number, separated by a delimiter and framed by a header and trailer. Companies could exchange EDI transmissions only over a high bandwidth subscriber network called a Value-Added Network (VAN). This solution resulted in a costly proposition including custom code, the proprietary EDI over VAN, and the subsequent transmission fees.

These varying solutions have resulted in a patchwork of odd systems and subsystems with no clear roadmap and a general inaccessibility of business data. Creating a single view of business data remains a significant requirement in the business world today?and it is still a problem. What’s needed is a simple way to view aggregated structured and unstructured data pulled from multiple data sources scattered across a company or industry. Extensible Markup Language (XML) provides this capability.Arrival of XML
In the last few years, companies have begun to use XML to move business documents over the Internet. Prior to XML, companies had to build custom code to pull data from their data sources, construct a message, and then pay transmission fees to send it to another application or to a vendor company’s application. Instead of using proprietary formats, companies can now use XML and standard tools to pull data into a queue.

XML-based business interchange has proven itself more flexible and less expensive to implement. Numerous XML parsers are available for free, and its common information model is widely supported by many tools. The architecture of XML Web services is an open Internet standard that allows communication between business systems and data sources and provides a way to expose data in back-end systems, thus leveraging the existing infrastructure.

Companies that take advantage of XML can let their employees and clients connect directly to enterprise systems and data sources. A company’s ability to structurally identify sections of a document and data elements allows developers to create applications that can intelligently respond to user input. (To find out how XML supports these capabilities, see the sidebar “Why XML Solves the Integration Problem.”)

Connecting Users to Data
Interestingly, the technology fueling integration is also fueling an outgrowth of the network?one that makes it possible to make mobile employees full-fledged network clients. The maturation of wireless technology, of course, quickly introduced mobile clients, such as mobile phones and PDAs, operated by end users. Typically, a client requests data from a back-end data system. That data, when converted to XML, can be converted to WML (Web Markup Language), wrapped in the SOAP protocol, and displayed in the mobile device. Once the client receives the returned document, it performs application-specific actions on that returned data, such as applying a stylesheet, before presenting the data to the end user. By using XML, a single view of the requested company data can be presented to a client. This advancement has been critical to the success of wireless client computing.

Mobile clients can also use Web services to connect to their back-end data. Companies that create smart client systems to take advantage of XML Web services by accessing the information directly and dynamically, and then presenting the right information in a defined format where it is needed, can present their employees and customers with a real solution to their data access needs. Using XML, the client application can send updated XML in a request to the server and effectively change the back-end data sources immediately.

What About Unstructured Data?
Not all business data is contained within databases?if it were, integration would be far less challenging. Although we have some well-established methods for storing and validating structured data, unstructured and semi-structured data is more difficult.

Businesses’ structured data consists of information that is organized so that it can be easily located, searched, and updated. Structured data is most often contained within databases, but this type of data represents only a portion of any company’s entire business data. In addition to the structured data, a business will typically create, own, and distribute a significant amount of unstructured and semi-structured data too.

Unstructured data can be defined as any data that has no convenient technology or tool to parse the information into elements and provide access to the information. An image is one example of true unstructured data because there is no method to isolate one element of an image. You cannot, for example, isolate an individual’s face in a group picture.

Businesses’ semi-structured data consists of white papers, letters, marketing materials, reports, memorandums, research, presentations, Web pages, and e-mails. The data is semi-structured because tools and technologies exist that define the elements of the data and present it in a defined layout. For example, browsers use a document object model (DOM) to parse the HTML contained within a Web page and present it as defined.

A business’s unstructured and semi-structured data can contain a significant amount of text that cannot be reliably accessed by multiple users. The unstructured or semi-structured format of the data contributes to its inaccessibility and creates a conversion problem for any business, but this data is also vital to a company’s success.

XML Variants for Industry-specific Problems
Financial firms are one example of an industry that is drowning in data, and one that has turned to the XML family of technologies for an integration solution. Mutual funds are managed by teams of portfolio managers who must sift through massive amounts of information, including research reports, news clips, financial statements, e-mails, and voice mail messages, every day to make informed investment decisions. Every manager must also assimilate company press releases, information from conference calls and phone conversations, and new SEC filings.

IT departments at financial firms are looking for ways to organize financial research so it can be analyzed efficiently and intuitively. Some of the roadblocks include the lack of a single source to collect and store stock research information, the sheer magnitude of research information sent each day, and the cost to store and manage paper reports. Because the research data is handled in various ways by individual companies, attempts to categorize, aggregate, or distribute financial research have not been successful.

Organizations started to envision a system that would present the research data to their employees and customers in a single view. Before the vision could work, everyone needed to speak a common language; a common information model and a centralized Web resource for presenting the research were necessary. The industry designed and proposed an XML schema specifically for these purposes.


Figure 1. Handling Source Data Before and After RIXML: With RIXML, data goes directly from source to portal, regardless of its format.

The Research Information Exchange Markup Language (RIXML) describes how buy- and sell-side financial firms will exchange investment and financial research. RIXML provides an open, extensible, and flexible standard to process, aggregate, and distribute the growing body of research information. If it is well implemented, it can access and aggregate data, eliminating the time-consuming task of manually tagging an entire database.

Typically, brokerage houses and research firms deliver research information and data using e-mail and FTP. In some cases, the content level can be as high as tens of thousands of messages each day. As new research is received, by e-mail or FTP for example, RIXML can tag and incorporate the data into the available research. Once RIXML is applied to the data, the old solution of writing hundreds of custom applications to convert the incoming content is eliminated.

After RIXML is applied to the stored legacy data and the incoming data, it can be presented in an aggregated view. This collected view can be presented in many ways, but a Web portal is one method. The collected data can also be published to a PDF and distributed to both analysts and clients.

                                             Auto Parts & Equipment                         Auto Components                   Automobiles & Components             Consumer Discretionary             Autos, Auto Parts & Tires              Magna International             Outperform                      Neutral

The beauty of the RIXML solution is that companies can create a system that organizes and collates the information so that employees can avoid irrelevant data and sift through the meaningful information in an efficient manner. Data coming from multiple sources in multiple formats is no longer cause for writing custom code to ensure that the data is stored and then presented to its intended audience.

Because the data can be collected and stored efficiently, a company’s analysts can be presented with new data each morning. When analysts want to inform their clients about a buy or sell recommendation, they can present to their clients the same information they used to reach their recommendation. The analysts and their clients are then on the same page, can see the same data, and make informed decisions.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.