The Database in a Traditional Monolithic System
The database in a traditional enterprise software environment is at the center of everything. It is typically a centralized, big, inflexible relational database and it is operated, maintained and guarded by dedicated DBAs. Multiple applications all talk to the same big database and any schema change can have dire consequences.
Scaling and Agility
Massive web applications and big data changed all that. The data just didn’t fit on a single database server. Vertical scaling reached its limits for many applications. In addition, the hectic pace of Agile development didn’t work well with traditional DB management practices.
The New Breed
A whole slew of different databases with unique design goals and target domains emerged. New categories of databases such as document stores, key-value stores, columnar databases and countless variations popped up every which way. NoSQL became a required buzzword. NoSQL was at first No SQL and then it became Not Only SQL. It turns out those relational databases actually have a point. That made life even more interesting when companies tried to figure out the right combination of database technologies to use and how to use them.
Another big trend was Service Oriented Architecture (SOA). SOA was all about dividing the system into services that communicated through well-defined APIs. The microservices architectures are an evolution of SOA, kind of a more agile SOA. You still got your services, but they are typically smaller, you have more of them, they can be deployed independently and they interact often using lightweight REST APIs.
The Data Problem
So far so good. The problem is what to do with the data? Those microservices are supposed to be autonomous. Where do they keep their data? How would they expose their data to other services, applications and users? What about data that needs to be shared by multiple services? Who is the owner? What are the rules of engagement? Lots of big questions. Let’s look at some of the possible approaches to address those questions.
The Naked Database
With the naked database approach you just sidestep the whole issue. You still have your centralized database and all the microservices access it. You pretty much sacrifice the notion of microservices because they are not that autonomous anymore. They depend heavily on the central database. But, they are isolated from each other at the code level and can be deployed independently. Schema changes become a big deal because you have to figure out which microservices may be impacted by the change and upgrade them in tandem. The benefits is good performance (no intermediaries between the DB and the consumers, possible to perform joins at the DB level) and central management. The downside is that if you are at a scale where you need microservices your data is too big for a relational DB anyway and the centralization makes it more difficult due to inadvertent inter-dependencies at the data level between microservices.
Centralized Data Service
You are still using a centralized database, but you put one big service in front of it. This extra layer of indirection allows hiding a big distributed database or even multiple databases behind a single front. You can add transparent caching and do all kinds of smart stuff behind the scenes, including distributed caching, fine-grained access control and more. You can even split your database without telling anyone. This is still not scalable if you have hundreds or thousands of microservices all requiring access to some part of the data.
Generic Micro Data Services
With this approach multiple databases stand on their own and are unaware exactly how the overall system is using them. Each database has a data service associated with it that exposes its data through an API. The API doesn’t have to be a high-overhead REST API. It may use more high-performance technologies such as TCP as long as it is used internally and not exposed to the world. This design is scalable as you can easily add more services and more data without impacting existing services. Performance may be a problem with this approach as services may require to send many individual queries to the generic data service to get the data they need. Several microservices may use data from the same generic data service. All the generic data services may use shared infrastructure. It’s easy to change schema.
This is the holy grail of microservices. Each microservice is responsible for managing its own data. Nobody even sees the database outside of the service. The data managed by the service is exposed through a custom API that may provide high-level operations and queries that would requires multiple atomic operations with a generic data service. This approach works well if you understand well how your data will be accessed and you don’t need totally ad-hoc queries. Sharing data between microservices is harder, as you need to add any additional access to the API explicitly.
Microservices are a great architecture for large-scale complicated systems. Managing the data of such systems is non-trivial to say the least. There are various options and you’ll have to decide which one works for you. It is also very likely that you’ll use a combination of several approaches.