What is Total Order Broadcast
- Total order broadcast and multicast (also called atomic broadcast or atomic multicast) is an important problem in distributed systems, especially with respect to fault-tolerance.
- In short, the primitive ensures that messages sent to a set of processes are delivered by all these processes in the same total order. It is also called Atomic Broadcast
What is the relation of Fault Tolerance and Total Order Broadcast
- Designing an algorithm for atomic broadcasts is relatively easy if it can be assumed that computers will not fail.
- if there are no failures, atomic broadcast can be achieved simply by having all participants communicate with one “leader” which determines the order of the messages, with the other participants following the leader.
- However, real computers are faulty; they fail and recover from failure at unpredictable, possibly inopportune, times. For example, in the follow-the-leader algorithm, what if the leader fails at the wrong time? In such an environment achieving atomic broadcasts is difficult.
Where is Total Order Broadcast used
The Zookeeper Atomic Broadcast (ZAB) protocol is the basic building block for Apache ZooKeeper, a fault-tolerant distributed coordination service which underpins Hadoop and many other important distributed systems