A primary–secondary topology (sometimes called leader–follower, master–replica, or active–passive) is one of the most common architectures used in distributed systems and databases. It is simple, predictable, and easy to reason about. But it also has sharp edges that appear once systems scale or require strong availability.
Let’s break it down.
In a primary–secondary topology, one node is designated as the primary (leader), and one or more nodes are secondaries (replicas).
The core rule:
- All writes go to the primary
- Secondaries replicate the primary’s data
- Reads may come from secondaries
Basic flow:
Client → Primary (write)
Primary → Secondaries (replication)
Client → Secondary (read)
Think of it like a newsroom.
- The primary editor writes the official article.
- Copy editors receive updates and distribute them.
- Readers can read from any copy editor, but only the primary editor changes the article.
Why Systems Use It
The topology solves several practical problems.
1. Write coordination
If multiple nodes accept writes independently, conflicts become complex.
Primary–secondary solves this by enforcing:
Single writer → many readers
2. Read scaling
Secondaries can serve read traffic.
Example:
| Node | Role | Traffic |
|---|---|---|
| Primary | Writes + some reads | 10% |
| Secondary A | Reads | 45% |
| Secondary B | Reads | 45% |
This dramatically increases read throughput.
3. Failover capability
If the primary dies, a secondary can be promoted to primary.
This is common in:
- PostgreSQL replication
- MySQL replication
- MongoDB replica sets
- Redis replication
- Elasticsearch clusters
The Replication Mechanism
Primary–secondary systems replicate changes from the primary to replicas.
Common mechanisms include:
Log shipping
The primary writes changes to a write-ahead log (WAL).
Secondaries replay the log.
Example flow:
Write request
↓
Primary commits transaction
↓
Primary writes to WAL
↓
Secondaries read WAL
↓
Secondaries replay change
State snapshot + streaming
- Replica copies the full dataset
- Then receives incremental updates
This is how systems like PostgreSQL streaming replication work.
The Hidden Assumption
Primary–secondary architectures assume something important:
The primary is always reachable and authoritative.
This assumption works well in small deployments.
But once networks get messy, the model begins to break down.
When Primary–Secondary Breaks Down
Several real-world conditions cause problems.
1. Replication Lag
Replication is usually asynchronous.
So secondaries are slightly behind.
Example:
User writes: balance = $500
Primary updates immediately
Secondary still shows:
balance = $200
This leads to stale reads.
Real production issues include:
- inconsistent dashboards
- outdated search indexes
- Incorrect financial reads
Many systems expose this as:
read-after-write inconsistency
2. Primary Bottleneck
Since all writes go through the primary, it becomes a scaling limit.
If the primary handles:
- 50k writes/sec
- heavy transaction logic
- replication streams
It becomes CPU, disk, or network-bound.
You eventually hit a hard ceiling.
Solutions often involve:
- sharding
- partitioning
- multi-leader systems
3. Failover Complexity
If the primary fails, you must elect a new primary.
This introduces problems like:
Split brain
Two nodes believe they are primary.
Example:
Network partition
Cluster A → elects node 1 primary
Cluster B → elects node 2 primary
Now both accept writes.
When the network heals, the system has conflicting histories.
Consensus protocols like Raft or Paxos exist largely to solve this problem.
4. Network Partitions
In distributed systems, networks fail more often than machines.
A partition might isolate:
Primary + 1 replica
vs
3 replicas
Which side should remain writable?
Systems must choose between:
- Consistency
- Availability
This is the CAP theorem tradeoff.
Some systems freeze writes. Others risk divergence.
5. Long Tail Latency
Replication creates cascading latency.
Example:
Client → Primary (5ms)
Primary → Replica A (20ms)
Replica A → Replica B (40ms)
Large clusters can end up with a multi-second lag during spikes.
Real Systems That Use Primary–Secondary
Examples include:
PostgreSQL
Streaming replication model.
Primary handles all writes.
Replicas replay WAL logs.
MySQL
Classic asynchronous replication.
Often used with read replicas behind load balancers.
MongoDB
Replica sets with automatic leader election.
Redis
Primary with replica nodes.
Replication can be async or semi-sync.
When Engineers Move Beyond It
Primary–secondary works great until systems require:
- massive write scaling
- low-latency global writes
- strong multi-region availability
At that point, architectures evolve to:
Multi-primary (multi-leader)
Multiple nodes accept writes.
Examples:
- MySQL group replication
- Cassandra
Leaderless systems
Clients write to multiple nodes directly.
Examples:
- DynamoDB
- Cassandra
- Riak
These designs trade simplicity for availability and scale.
The Core Insight
Primary–secondary topology is popular because it simplifies a hard problem.
Instead of solving distributed consensus for every write, the system centralizes writes into one authority.
But that simplicity creates structural limits:
- single write bottleneck
- replication lag
- complicated failover
- partition sensitivity
This is why large distributed systems eventually evolve toward more complex topologies.
Rashan is a seasoned technology journalist and visionary leader serving as the Editor-in-Chief of DevX.com, a leading online publication focused on software development, programming languages, and emerging technologies. With his deep expertise in the tech industry and her passion for empowering developers, Rashan has transformed DevX.com into a vibrant hub of knowledge and innovation. Reach out to Rashan at [email protected]






















