Skip to content

Latest commit

 

History

History
145 lines (114 loc) · 8.06 KB

File metadata and controls

145 lines (114 loc) · 8.06 KB
+------------------------------------------------------------------------------------------------------------------------------------+
|                                              DATA REPLICATION AND STRONG CONSISTENCY                                                |
+------------------------------------------------------------------------------------------------------------------------------------+
|                          +--------------------+              +--------------------+              +--------------------+             |
|                          |   Client 1          |              |    Client 2         |              |    Client 3         |             |
|                          +--------------------+              +--------------------+              +--------------------+             |
|                                  |                                |                                |                             |
|                                 --->                              --->                              --->                           |
|                              +--------------------+              +--------------------+                                           |
|                              |     Load Balancer   |------------>|     Leader (Primary)  |                                           |
|                              +--------------------+              +--------------------+                                           |
|                                       |                                  |                                                       |
|                                +-------------------+               +--------------------+              +--------------------+     |
|                                |     Follower 1     |               |     Follower 2     |              |     Follower 3     |     |
|                                +-------------------+               +--------------------+              +--------------------+     |
|                                 |   - Syncs from Leader             |  - Syncs from Leader              |  - Syncs from Leader      |
|                                 |   - Handles Read Requests         |  - Handles Read Requests          |  - Handles Read Requests  |
+------------------------------------------------------------------------------------------------------------------------------------+

+-----------------------------------------+
|           Replication Techniques         |
+-----------------------------------------+
| - Leader-Follower Replication            |
| - Data Replication for High Availability |
| - Strong Consistency through Leader Node |
+-----------------------------------------+

+------------------------------------------+
|           Consistency Challenges          |
+------------------------------------------+
| - Replication Lag: Time taken for         |
|   followers to sync with the leader       |
| - Stale Reads: Data may not be current    |
|   if the replication is delayed           |
| - Eventual Consistency: Followers         |
|   eventually catch up with the leader     |
+------------------------------------------+

Data Replication and Strong Consistency

Explanation

1. What is Data Replication?

Data replication involves creating and maintaining copies of data across multiple machines connected over a network. These copies are referred to as replicas.

Why Replicate Data?

  • High Availability: Even if one server goes down, the data is still accessible from other replicas.
  • Fault Tolerance: Helps the system remain operational during hardware failures.
  • Improved Read Throughput: Multiple replicas can serve read requests simultaneously.

How Data Replication Works:

  1. Primary Database (Leader):
    • Stores the original data and handles all write operations (Create, Update, Delete).
  2. Replica Databases (Followers):
    • Maintain synchronized copies of the leader’s data.
    • Serve read operations to reduce the load on the primary database.
  3. Replication Process:
    • The leader records changes in a changelog and propagates them to followers.
    • Followers periodically synchronize with the leader to stay updated.

2. Strong Consistency:

Strong consistency means that clients cannot distinguish whether the data is replicated or not. In other words, users always see the latest data, even if it involves waiting for updates.

Key Characteristics:

  • All writes go to the leader/primary/master.
  • Reads can be performed from any replica but may face inconsistency if not updated.
  • The leader ensures that changes are propagated to all followers before acknowledging a write.

Issues with Strong Consistency:

  • Replication Lag:
    • Followers may display stale data until they catch up with the leader.
  • Eventual Consistency:
    • After some time, all replicas will have the same data, but during the replication process, there may be discrepancies.
  • Read-After-Write Consistency:
    • Ensures that once a write is acknowledged, any subsequent read will reflect the change.

3. Leader-Follower Model:

  • Leader: Handles writes and maintains the changelog.
  • Followers: Sync from the leader and handle reads.
  • Load Balancer (LB): Distributes read requests among followers to reduce load on the leader.

Fun Analogy:

Think of data replication like sharing a group photo among friends:

  • You take a picture (leader), and everyone else gets a copy (followers).
  • If you update your photo (e.g., add a filter), you must resend it to everyone for consistency.
  • Until everyone updates their copy, they may have the old version (inconsistency).
  • Eventually, everyone will have the same version (eventual consistency). 📸

Mnemonic:

"L-F-L-R" - Leaders Follow Leads Rightly

  • L - Leader: Primary node that handles writes.
  • F - Follower: Secondary nodes that sync with the leader.
  • L - Load Balancer: Distributes read requests.
  • R - Replication Lag: The delay before followers sync with the leader.

Interview Questions with Answers:

1. What is data replication and why is it important?

Answer:
Data replication is the process of copying data from one location (leader) to multiple other locations (followers) to ensure data availability, fault tolerance, and high read throughput. It is crucial for maintaining service continuity in distributed systems.


2. What are the challenges of achieving strong consistency?

Answer:
Strong consistency requires that all replicas have the latest data. Challenges include:

  • Replication Lag: Time taken for followers to catch up with the leader.
  • Stale Reads: Clients may read outdated data from followers if synchronization is not completed.
  • Trade-Off: Balancing consistency with availability and latency.

3. How do leader and follower roles differ in replication?

Answer:

  • Leader: Responsible for handling write operations and propagating changes.
  • Followers: Maintain copies of leader data and serve read operations.
  • Load Balancer: Directs read requests to followers to reduce leader load.

4. What is the difference between strong consistency and eventual consistency?

Answer:

  • Strong Consistency: All replicas immediately reflect updates. Clients always see the latest data.
  • Eventual Consistency: Replicas may temporarily show old data, but will eventually converge to the correct state. Suitable when low latency is prioritized over immediate consistency.

Additional Teaching:

In large-scale distributed systems, achieving strong consistency can be challenging. Many systems prefer eventual consistency when handling massive amounts of data, as it balances performance and reliability. Techniques such as quorum-based replication and conflict resolution strategies are employed to minimize inconsistency.