pplication architects design distributed applications based largely on their computing resources and network infrastructure. The goal is to ensure that users have ready access to computing resources, and that those computing resources have access to application data. While the object-oriented development approach is useful for developing applications in general, a data-centric approach is better for designing and developing distributed applications.
This article introduces the data-centric approach, explaining how to design with data-centric principles and implement data-centric applications. You will learn how data-centric design and data-oriented implementation can enable more robust and scalable distributed systems that also are easier to maintain and enhance over time.
|Editor's Note: The authors, Dr. Gerardo Pardo-Castellote and Supreet Oberoi, respectively are the CTO and VP Engineering for Real-Time Innovations, Inc., a vendor of real-time information networking solutions. We have selected this article for publication because we believe it to have objective technical merit.
The Challenge of Building Distributed Applications
Consider a programmatic financial-trading system with subsystems running trading algorithms based on market data that they receive from data feeds such as Reuters. The data feeds provide information such as open market orders, daily highs and lows for a symbol, volume, number of trades, and closes, among other things. In addition, many of the subsystems produce data that is essential to the successful operation of some others, or that at the very least would significantly enhance their ability to produce quality results. For example, one of the subsystems can produce information such as long CCI index and short CCI index that is useful to other systems making trading decisions. These systems are connected through a standards-based network, so data communication between endpoints is common and continuous.
One way to ensure that these disparate systems leverage data from one another across a network is by examining the data requirements of each distributed end-point and drawing one-to-one interfaces between those end-points. Once the relationship is established, the data is passed between those two end-points, likely using a polling or producer/consumer architecture. However, there are significant challenges in building and maintaining distributed applications with this approach.
Complex Lifecycle Management
The application architect cannot assume that the network and application architecture is static and unchanging. If a market-data feed is swapped for an upgraded pipe, the targeting algorithmic code has to be adapted to recognize and work with the new market-data format. Further, trading subsystems could be co-dependent on one another; they may use data from one another to successively refine initial estimates, or to locate and successively target trading prices and volumes.
Designing these distributed systems is complex, and maintaining them throughout the system lifecycle can be a technical and logistical nightmare. Every upgrade and system modification will require extensive testing to ensure that changes did not introduce incompatibilities. Code changes will be more likely than not.
Coordinating between servers, and between servers and clients, is complex and technically difficult. By establishing direct one-to-one interfaces between endpoints, upgrading parts of the system becomes complex, requiring code changes and exhaustive testing of new configurations. Further, if new data is made available on the network, the other endpoints will require additional code even if they do not plan to use the data. All one-to-one connections will need code modification and extensive testing.
This characteristic makes scaling distributed applications very challenging. Since one-to-one data connections are fragile, and because new endpoints that require change to underlying code can be added, expanding the application to include a larger network with more endpoints is technically challenging.