Distributed Systems

How to build highly available system using state machine replication


What is a highly available system

  • A system is available if it provides service promptly on demand.
  • The only way to make a highly available system out of less available components is to use redundancy, so the system can work even when some of its parts are broken.
  • The simplest kind of redundancy is replication: make several copies or ‘replicas’ of each part.

What is state machine

A State Machine begins at the State labeled Start. Each Input received is passed through the transition and output function to produce a new State and an Output. The State is held stable until a new Input is received, while the Output is communicated to the appropriate receiver.

A state machine can be thought as following tuple of values.

  • A set of States
  • A set of Inputs
  • A set of Outputs
  • A transition function (Input × State → State)
  • An output function (Input × State → Output)
  • A distinguished State called Start.

What is state machine replication

In computer science, state machine replication (SMR) or state machine approach is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols

The state machine approach for implementing fault tolerance

A very simple technique for implementing a fault-tolerant service in terms of a State Machine:

  1. Place copies of the State Machine on multiple, independent servers.
  2. Receive client requests, interpreted as Inputs to the State Machine.
  3. Choose an ordering for the Inputs.
  4. Execute Inputs in the chosen order on each server.
  5. Respond to clients with the Output from the State Machine.
  6. Monitor replicas for differences in State or Output.

How to build highly available system using consensus

  • The idea is to use a replicated deterministic state machine, and get consensus on each input.
  • To make it efficient, use leases (Locks that time out) to replace most of the consensus steps with actions by one process.
  • Most fault tolerant algorithm for consensus without real time guarantees is the Lamport Paxos Algorithm.
  • The recipe is to write a simple spec as a state machine, find the abstraction function from the implementation to the spec, establish suitable invariants, and show that the implementation simulates the spec


How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments