What is a highly available system
- A system is available if it provides service promptly on demand.
- The only way to make a highly available system out of less available components is to use redundancy, so the system can work even when some of its parts are broken.
- The simplest kind of redundancy is replication: make several copies or ‘replicas’ of each part.
What is state machine
A State Machine begins at the State labeled Start. Each Input received is passed through the transition and output function to produce a new State and an Output. The State is held stable until a new Input is received, while the Output is communicated to the appropriate receiver.
A state machine can be thought as following tuple of values.
- A set of States
- A set of Inputs
- A set of Outputs
- A transition function (Input × State → State)
- An output function (Input × State → Output)
- A distinguished State called Start.
What is state machine replication
In computer science, state machine replication (SMR) or state machine approach is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols
The state machine approach for implementing fault tolerance
A very simple technique for implementing a fault-tolerant service in terms of a State Machine:
- Place copies of the State Machine on multiple, independent servers.
- Receive client requests, interpreted as Inputs to the State Machine.
- Choose an ordering for the Inputs.
- Execute Inputs in the chosen order on each server.
- Respond to clients with the Output from the State Machine.
- Monitor replicas for differences in State or Output.
How to build highly available system using consensus
- The idea is to use a replicated deterministic state machine, and get consensus on each input.
- To make it efficient, use leases (Locks that time out) to replace most of the consensus steps with actions by one process.
- Most fault tolerant algorithm for consensus without real time guarantees is the Lamport Paxos Algorithm.
- The recipe is to write a simple spec as a state machine, find the abstraction function from the implementation to the spec, establish suitable invariants, and show that the implementation simulates the spec