Distributed Systems

Key Design Lessons Learnt while implementing Paxos algorithm


Build Real world systems using Paxos

  • Fault-tolerance on commodity hardware can be achieved through replication. A common approach is to use a consensus algorithm to ensure that all replicas are mutually consistent.
  • Fault-tolerant algorithms tolerate a limited set of carefully selected faults. However, the real world exposes software to a wide variety of failure modes, including errors in the algorithm, bugs in its implementation
  • A real system is rarely specified precisely. Even worse, the specification may change during the implementation phase. Consequently, an implementation should be malleable.

What is Chubby

  • Chubby] is a fault-tolerant system at Google that provides a distributed locking mechanism and stores small files. Typically there is one Chubby instance, or “cell”, per data center. Several Google systems – such as the Google Filesystem (GFS) and Bigtable – use Chubby for distributed coordination and to store a small amount of metadata.
  • Chubby achieves fault-tolerance through replication. A typical Chubby cell consists of five replicas, running the same code, each running on a dedicated machine.
  • Chubby clients (such as GFS and Bigtable) contact a Chubby cell for service. The master replica serves all Chubby requests. If a Chubby client contacts a replica that is not the master, the replica replies with the master’s network address. The Chubby client may then contact the master. If the master fails, a new master is automatically elected, which will then continue to serve traffic based on the contents of its local copy of the replicated database. Thus, the replicated database ensures continuity of Chubby state across master failover.

Paxos Basics

Paxos is a consensus algorithm executed by a set of processes, termed replicas, to agree on a single value in the presence of failures. The algorithm consists of three phases, which may be repeated (because of failures).

  • Elect a replica to be the coordinator.
  • The coordinator selects a value and broadcasts it to all replicas in a message called the accept message. Other replicas either acknowledge this message or reject it.
  • Once a majority of the replicas acknowledge the coordinator, consensus has been reached, and the coordinator broadcasts a commit message to notify replicas.

Software Engineering Principles learnt while implementing Paxos

  • Fault-tolerant systems are expected to run continuously for long periods of time. Users are much less likely to tolerate bugs than in other systems.
  • Fault-tolerant algorithms are notoriously hard to express correctly, even as pseudo-code. This problem is worse when the code for such an algorithm is intermingled with all the other code that goes into building a complete system.
  • The chance for inconsistencies increases with the size of the code base, the duration of a project, and the number of people working simultaneously on the same code. We used various active self-checking mechanisms such as the liberal use of assert statements, and explicit verification code that tests data structures for consistency.
  • Given the current state of the art, it is unrealistic to prove a real system such as ours correct. To achieve robustness, the best practical solution in addition to meticulous software engineering is to test a system thoroughly

Lessons learnt from implementing Paxos

  • There are significant gaps between the description of the Paxos algorithm and the needs of a real-world system. In order to build a real-world system, an expert needs to use numerous ideas scattered in the literature and make several relatively small protocol extensions. The cumulative effort will be substantial and the final system will be based on an unproven protocol.
  • The fault-tolerance computing community has not developed the tools to make it easy to implement their algorithms.
  • The fault-tolerance computing community has not paid enough attention to testing, a key ingredient for building fault-tolerant systems.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?