+------------------------------------------------------------------------------------------------------------------------------------+
| DATA REPLICATION AND STRONG CONSISTENCY |
+------------------------------------------------------------------------------------------------------------------------------------+
| +--------------------+ +--------------------+ +--------------------+ |
| | Client 1 | | Client 2 | | Client 3 | |
| +--------------------+ +--------------------+ +--------------------+ |
| | | | |
| ---> ---> ---> |
| +--------------------+ +--------------------+ |
| | Load Balancer |------------>| Leader (Primary) | |
| +--------------------+ +--------------------+ |
| | | |
| +-------------------+ +--------------------+ +--------------------+ |
| | Follower 1 | | Follower 2 | | Follower 3 | |
| +-------------------+ +--------------------+ +--------------------+ |
| | - Syncs from Leader | - Syncs from Leader | - Syncs from Leader |
| | - Handles Read Requests | - Handles Read Requests | - Handles Read Requests |
+------------------------------------------------------------------------------------------------------------------------------------+
+-----------------------------------------+
| Replication Techniques |
+-----------------------------------------+
| - Leader-Follower Replication |
| - Data Replication for High Availability |
| - Strong Consistency through Leader Node |
+-----------------------------------------+
+------------------------------------------+
| Consistency Challenges |
+------------------------------------------+
| - Replication Lag: Time taken for |
| followers to sync with the leader |
| - Stale Reads: Data may not be current |
| if the replication is delayed |
| - Eventual Consistency: Followers |
| eventually catch up with the leader |
+------------------------------------------+
Data replication involves creating and maintaining copies of data across multiple machines connected over a network. These copies are referred to as replicas.
- High Availability: Even if one server goes down, the data is still accessible from other replicas.
- Fault Tolerance: Helps the system remain operational during hardware failures.
- Improved Read Throughput: Multiple replicas can serve read requests simultaneously.
- Primary Database (Leader):
- Stores the original data and handles all write operations (Create, Update, Delete).
- Replica Databases (Followers):
- Maintain synchronized copies of the leader’s data.
- Serve read operations to reduce the load on the primary database.
- Replication Process:
- The leader records changes in a changelog and propagates them to followers.
- Followers periodically synchronize with the leader to stay updated.
Strong consistency means that clients cannot distinguish whether the data is replicated or not. In other words, users always see the latest data, even if it involves waiting for updates.
- All writes go to the leader/primary/master.
- Reads can be performed from any replica but may face inconsistency if not updated.
- The leader ensures that changes are propagated to all followers before acknowledging a write.
- Replication Lag:
- Followers may display stale data until they catch up with the leader.
- Eventual Consistency:
- After some time, all replicas will have the same data, but during the replication process, there may be discrepancies.
- Read-After-Write Consistency:
- Ensures that once a write is acknowledged, any subsequent read will reflect the change.
- Leader: Handles writes and maintains the changelog.
- Followers: Sync from the leader and handle reads.
- Load Balancer (LB): Distributes read requests among followers to reduce load on the leader.
Think of data replication like sharing a group photo among friends:
- You take a picture (leader), and everyone else gets a copy (followers).
- If you update your photo (e.g., add a filter), you must resend it to everyone for consistency.
- Until everyone updates their copy, they may have the old version (inconsistency).
- Eventually, everyone will have the same version (eventual consistency). 📸
"L-F-L-R" - Leaders Follow Leads Rightly
- L - Leader: Primary node that handles writes.
- F - Follower: Secondary nodes that sync with the leader.
- L - Load Balancer: Distributes read requests.
- R - Replication Lag: The delay before followers sync with the leader.
Answer:
Data replication is the process of copying data from one location (leader) to multiple other locations (followers) to ensure data availability, fault tolerance, and high read throughput. It is crucial for maintaining service continuity in distributed systems.
Answer:
Strong consistency requires that all replicas have the latest data. Challenges include:
- Replication Lag: Time taken for followers to catch up with the leader.
- Stale Reads: Clients may read outdated data from followers if synchronization is not completed.
- Trade-Off: Balancing consistency with availability and latency.
Answer:
- Leader: Responsible for handling write operations and propagating changes.
- Followers: Maintain copies of leader data and serve read operations.
- Load Balancer: Directs read requests to followers to reduce leader load.
Answer:
- Strong Consistency: All replicas immediately reflect updates. Clients always see the latest data.
- Eventual Consistency: Replicas may temporarily show old data, but will eventually converge to the correct state. Suitable when low latency is prioritized over immediate consistency.
In large-scale distributed systems, achieving strong consistency can be challenging. Many systems prefer eventual consistency when handling massive amounts of data, as it balances performance and reliability. Techniques such as quorum-based replication and conflict resolution strategies are employed to minimize inconsistency.