A Data-Centric Approach to Distributed Application Architecture

A Data-Centric Approach to Distributed Application Architecture

pplication architects design distributed applications based largely on their computing resources and network infrastructure. The goal is to ensure that users have ready access to computing resources, and that those computing resources have access to application data. While the object-oriented development approach is useful for developing applications in general, a data-centric approach is better for designing and developing distributed applications.

This article introduces the data-centric approach, explaining how to design with data-centric principles and implement data-centric applications. You will learn how data-centric design and data-oriented implementation can enable more robust and scalable distributed systems that also are easier to maintain and enhance over time.

Editor’s Note: The authors, Dr. Gerardo Pardo-Castellote and Supreet Oberoi, respectively are the CTO and VP Engineering for Real-Time Innovations, Inc., a vendor of real-time information networking solutions. We have selected this article for publication because we believe it to have objective technical merit.

The Challenge of Building Distributed Applications
Consider a programmatic financial-trading system with subsystems running trading algorithms based on market data that they receive from data feeds such as Reuters. The data feeds provide information such as open market orders, daily highs and lows for a symbol, volume, number of trades, and closes, among other things. In addition, many of the subsystems produce data that is essential to the successful operation of some others, or that at the very least would significantly enhance their ability to produce quality results. For example, one of the subsystems can produce information such as long CCI index and short CCI index that is useful to other systems making trading decisions. These systems are connected through a standards-based network, so data communication between endpoints is common and continuous.

One way to ensure that these disparate systems leverage data from one another across a network is by examining the data requirements of each distributed end-point and drawing one-to-one interfaces between those end-points. Once the relationship is established, the data is passed between those two end-points, likely using a polling or producer/consumer architecture. However, there are significant challenges in building and maintaining distributed applications with this approach.

Complex Lifecycle Management
The application architect cannot assume that the network and application architecture is static and unchanging. If a market-data feed is swapped for an upgraded pipe, the targeting algorithmic code has to be adapted to recognize and work with the new market-data format. Further, trading subsystems could be co-dependent on one another; they may use data from one another to successively refine initial estimates, or to locate and successively target trading prices and volumes.

Designing these distributed systems is complex, and maintaining them throughout the system lifecycle can be a technical and logistical nightmare. Every upgrade and system modification will require extensive testing to ensure that changes did not introduce incompatibilities. Code changes will be more likely than not.

Tight Coupling
Coordinating between servers, and between servers and clients, is complex and technically difficult. By establishing direct one-to-one interfaces between endpoints, upgrading parts of the system becomes complex, requiring code changes and exhaustive testing of new configurations. Further, if new data is made available on the network, the other endpoints will require additional code even if they do not plan to use the data. All one-to-one connections will need code modification and extensive testing.

This characteristic makes scaling distributed applications very challenging. Since one-to-one data connections are fragile, and because new endpoints that require change to underlying code can be added, expanding the application to include a larger network with more endpoints is technically challenging.

The Data-Centric Approach
The data-centric approach requires that both the data producer and data consumer know both the existence and location of one another, and use identical data structures to exchange the data item. This approach can be efficient under many circumstances, but it is often easier to begin the application design process from a data-centric perspective instead. Rather than looking at specific data requirements for processing end-points, approaching the design from the standpoint of what data is generated through acquisition or processing may make more sense. If the data can be useful to any other process, it can be made available without knowing where or when it might be used.

This emerging alternative for asynchronous data transfers is publish-subscribe (or “pub-sub”). In this model, data sources, or producers, publish data to a known location on the network. This could be a memory-to-memory transfer, if high performance is required, or it could be a database or other persistent storage.

Processes that need that data can subscribe to it through the messaging service. When published data arrives at the shared location, a message goes out to the subscribers. Subscribers can then go to the shared location to obtain the data, and use it in their own processing.

A publish-subscribe model for data distribution enables the implementation of such a data-centric architecture across a large-scale network. For example, using the programmatic-trading example, a node can publish CCI index data to a known location on the network, and the other trading subsystems can subscribe to the CCI index data

With the publish-subscribe approach, you can upgrade or add endpoints without having to change code, or you can even test the resulting configuration exhaustively. Certainly if new data is available on the network, other endpoints may require additional code in order to make use of that data, but in practice, this is significantly simpler than modifying and testing a large number of specific one-to-one connections.

A commonality of data formats isn’t necessary in a publish-subscribe model. Because the source and destination of a given data item is unknown, any required data conversion occurs at the handoff. In fact, data consumers may well have different data format requirements, necessitating individual conversions based on those needs. The middleware that provides for the publish-subscribe services can manage any required data conversions.

Designing with Data-Centric Principles
Data-centricity provides a guide for designing distributed applications in general. Many system architects of distributed applications today use procedural or object-oriented principles in creating the fundamental design, often using UML sequence, class, and state diagrams.

These design methodologies tend to treat transporting and consuming data as second-class citizens, focusing instead on the step-by-step processes by which endpoints make computations and produce actionable results. A data-oriented methodology, on the other hand, focuses on the flow of data through the application.

Taking a closer look at the alternative design/development methodologies only highlights the issues. Procedure-oriented approaches focus on the computational aspects of the application, and concentrate on the device endpoints of the network where almost all processing occurs. Data is structured in order to assist with the computation, rather than making the data available in the first place. Once the computational processes are designed, the problem of getting data to those processes remains. However, this makes data movement almost an afterthought, and problems of timeliness and data formats become more difficult to address. Ultimately, the focus on computation means that the endpoint devices must have functions and data that are known and accounted for by one another.

Object-oriented methodologies base designs on the definition objects and their interactions. Because objects are defined by their computational processes (methods) as well as data, this approach appears to have a good balance between code and data. However, object-oriented methodologies presume that data is an inherent part of the object. The data formats are fixed within the object, and the methods act upon the data only within the context of the object. There is no inherent provision for exchanging data between objects, and for adapting data to the formats required for various processes.

The overriding characteristic of these common design methodologies is that they envision a distributed application as a set of processes, designed as procedures or objects. Data is assumed to be contained within each of the processes. Getting data to and from a particular process is a problem to be addressed after all of the processes are defined.

Alternatively, a data-centric methodology provides a much more natural and streamlined way of viewing and modeling many distributed applications. Such a methodology focuses on the data that is moving and transforming in the system, rather than the processes that are performing those actions.

In other words, the processes encapsulated in the endpoints become secondary. The flow of data defines the essential aspects of the application, and can be modeled with UML class and interaction diagrams. That is not to say that the computational processes are not important, but rather that they are not essential at a high level of design in understanding and mapping out the application. In real-time systems, getting the data to the process that requires it, when it requires it, is essential.

Implementing Data-Centric Applications
Data-centric applications can be implemented using the precepts of data-oriented programming. In general, the tenets of data-oriented programming include the following principles:

  • Expose the data. Ensure that the data is visible throughout the entire system. Hiding the data makes it difficult for new processing endpoints to identify data needs and gain access to that data.
  • Hide the code. Conversely, none of the computational endpoints has any reason to be cognizant of another’s code. By abstracting away from the code, data is free to be used by any process, no matter where it was generated. This provides for data to be shared across the distributed application, and for the application to be modified and enhanced during its lifecycle.
  • Separate data and code into data-handling and data-processing components. Data handling is required because of differing data formats, persistence, and timeliness, and is likely to change during the application lifecycle. Conversely, data processing requirements are likely to remain much more stable. By separating the two, the application becomes easier to maintain and modify over time.
  • Generate code from process interfaces. Interfaces define the data inputs and outputs of a given process. Having well-defined inputs and outputs makes it possible to understand and automate the implementation of the data-processing code.
  • Loosely couple all code. With well-defined interfaces and computational processes abstracted away from one another, endpoints and their computations can be interchanged with little or no impact on the distributed application as a whole.

Table 1 summarizes these and other principles, and it offers a comparison with object-oriented development tenets in order to contrast the two different approaches. The data-oriented approach enforces attention on the data rather than on the processes that manipulate the data.

Object-Oriented Programming Principles Data-Oriented Programming Principles
Hide the data (encapsulation) Expose the data (with MR format)
Expose methods ? code Hide the code
Intermix data and code Separate data and code
Mobile code Must agree on data mapping, mapping system
API/object model Messages are primary data model or schema
Combined processing, no restrictions Strict separation of parser, validator, transformer, and logic
Changes: Read and change code Changes: Change declarative data file
Tightly coupled Loosely coupled
Table 1. A Comparison of Data-Oriented and Object-Oriented Programming Principles

The data-oriented approach to application design is effective in systems where multiple data sources are required for successful completion of the computing activity, but those data sources reside in separate nodes on a network in a net-centric application infrastructure. For network-centric distributed applications, applying a data-oriented programming model lets you focus on the movement of data through the network, an easier and more natural way of abstracting and implementing the solution.

Data as the Design and Implementation Focal Points
Data-centric design and data-oriented implementation can bring about a more robust and scalable distributed system, and one that is easier to maintain and enhance over time. For real-time distributed applications that are highly dependent upon the movement of data through the system, the advantages of using data as the design and implementation focal points can make the difference for a successful project.

devx-admin

devx-admin

Share the Post:

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

Cisco Splunk Deal

Cisco Splunk Deal Sparks Tech Acquisition Frenzy

Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

Iran Drone Expansion

Iran’s Jet-Propelled Drone Reshapes Power Balance

Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

Solar Geoengineering

Did the Overshoot Commission Shoot Down Geoengineering?

The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

Remote Learning

Revolutionizing Remote Learning for Success

School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

Revolutionary SABERS Transforming

SABERS Batteries Transforming Industries

Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

Build a Website

How Much Does It Cost to Build a Website?

Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted

Battery Investments

Battery Startups Attract Billion-Dollar Investments

In recent times, battery startups have experienced a significant boost in investments, with three businesses obtaining over $1 billion in funding within the last month. French company Verkor amassed $2.1

Copilot Revolution

Microsoft Copilot: A Suit of AI Features

Microsoft’s latest offering, Microsoft Copilot, aims to revolutionize the way we interact with technology. By integrating various AI capabilities, this all-in-one tool provides users with an improved experience that not

AI Girlfriend Craze

AI Girlfriend Craze Threatens Relationships

The surge in virtual AI girlfriends’ popularity is playing a role in the escalating issue of loneliness among young males, and this could have serious repercussions for America’s future. A

AIOps Innovations

Senser is Changing AIOps

Senser, an AIOps platform based in Tel Aviv, has introduced its groundbreaking AI-powered observability solution to support developers and operations teams in promptly pinpointing the root causes of service disruptions

Bebop Charging Stations

Check Out The New Bebob Battery Charging Stations

Bebob has introduced new 4- and 8-channel battery charging stations primarily aimed at rental companies, providing a convenient solution for clients with a large quantity of batteries. These wall-mountable and

Malyasian Networks

Malaysia’s Dual 5G Network Growth

On Wednesday, Malaysia’s Prime Minister Anwar Ibrahim announced the country’s plan to implement a dual 5G network strategy. This move is designed to achieve a more equitable incorporation of both

Advanced Drones Race

Pentagon’s Bold Race for Advanced Drones

The Pentagon has recently unveiled its ambitious strategy to acquire thousands of sophisticated drones within the next two years. This decision comes in response to Russia’s rapid utilization of airborne

Important Updates

You Need to See the New Microsoft Updates

Microsoft has recently announced a series of new features and updates across their applications, including Outlook, Microsoft Teams, and SharePoint. These new developments are centered around improving user experience, streamlining

Price Wars

Inside Hyundai and Kia’s Price Wars

South Korean automakers Hyundai and Kia are cutting the prices on a number of their electric vehicles (EVs) in response to growing price competition within the South Korean market. Many

Solar Frenzy Surprises

Solar Subsidy in Germany Causes Frenzy

In a shocking turn of events, the German national KfW bank was forced to discontinue its home solar power subsidy program for charging electric vehicles (EVs) after just one day,

Electric Spare

Electric Cars Ditch Spare Tires for Efficiency

Ira Newlander from West Los Angeles is thinking about trading in his old Ford Explorer for a contemporary hybrid or electric vehicle. However, he has observed that the majority of

Solar Geoengineering Impacts

Unraveling Solar Geoengineering’s Hidden Impacts

As we continue to face the repercussions of climate change, scientists and experts seek innovative ways to mitigate its impacts. Solar geoengineering (SG), a technique involving the distribution of aerosols

Razer Discount

Unbelievable Razer Blade 17 Discount

On September 24, 2023, it was reported that Razer, a popular brand in the premium gaming laptop industry, is offering an exceptional deal on their Razer Blade 17 model. Typically