The Byzantine Generals Problem
The Byzantine Generals Problem is an abstract generalization for thinking about the failure in computer systems. Processors fail in multi-threaded systems, and nodes fail in distributed network systems. More specifically, processors and nodes can fail in strange and possibly malicious ways. How can we design computer systems to deal with these problems and what are the limitations of such solutions?
The Original Problem
Several, separate divisions of the Byzantine army are camped outside an enemy city. Each division is commanded by its own general. Some part of the army observes the enemy and wants to communicate the plan of action to the rest of the army. The army wants to either retreat or attack as a whole, but the generals can only communicate with a messenger.
Unfortunately, some of the generals may be traitors. They may tell other generals to retreat when they should have attacked, and vice versa. Worse, they can tell different generals different things. (In a computer system, whether or not this action is malicious or simply erroneous is irrelevant.) More formally, the army has the following goals:
A. All loyal (non-traitorous) generals decide on the same plan.
B. A small group of traitors cannot force the loyal generals to adopt a bad plan.
The paper admits that B is difficult to formalize. It would be unfortunate if, when the loyal generals detected a group of defectors, they simply retreated unconditionally. This would be a “bad” plan, since it wasn’t an intelligent decision and was triggered by a few bad actors.
You could imagine a strategy in which every general sends his proposed plan to every other general, and each general simply takes the majority plan of action.
But here, you can imagine that the evil general can tie up general 1 and 2, while convincing general 3 to attack. Because traitorous generals can send different information to different generals, if every general simply used
v(i) (the ith general’s value) for every
i, each general could have very different information. As such, the authors define the following property, for which goal A depends.
- Every loyal general must obtain the same information
v(1) . . . v(n). Or, any two loyal generals use the same value for
Condition A implies Condition 1, since as we’ve seen a traitor can send different values to different recipients. And if all loyal general obtains the same information, and they’re following the same algorithm, all loyal generals can decide on the same plan of action, so A <=> 1. (Note that 1 states that loyal generals use the same value for traitors, even if they received different messages from traitors…) When a recipient receives a message from a random general, there’s no way for the recipient to be sure that the value can be trusted. This suspicion may lead the recipient to incorrectly interpret a message from a general as a message from a traitor. The recipient may end up recording the wrong value…
This is a problem because the system may end up violating Condition B: loyal generals may take a conservative approach and interpret all messages as, say, retreat (which isn’t what they wanted to do, and is thus a bad plan).
In order to restrict the system and ensure this clause isn’t violated, we add another condition:
- If the
ithgeneral is loyal, the value he sends must be used by every loyal general as the value of
Note that if the
ith general is not loyal, the consequent does not hold.
Conditions 1 and 2 place restrictions on what loyal generals record for the
ith general’s broadcast. Condition 1 states that all loyal generals must use the same value, and Condition 2 states further that if
i is loyal, all loyal generals must use the correct value (the one he actually sent). Note that if we obtain 1 and 2, we obtain our goals, A and B.
The Byzantine Generals Problem
It turns out that we can reduce the aforementioned summary to a different problem: rather than
n generals, we can imagine 1 commanding general and
n - 1 lieutenant generals. Let’s restate our goals in terms of this new system:
IC1. All loyal lieutenants obey the same order.
IC2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends.
Note these are just transformations of our original Conditions 1 and 2 in terms of the Byzantine Generals Problem. If we can construct a system that satisfies IC1 (IC == Interactive Consistency) and IC2, we can solve the Original Problem by simply treating each general as a commanding general in turns.
Unfortunately, it’s not possible to construct a solution to the Byzantine Generals Problem with 3 participants if 1 of the generals is a traitor.
Let’s assume we are Lieutenant 1. We have no way of knowing if the commander is a traitor, or if the other lieutenant is. If we receive
attack from the general and
he said attack from the lieutenant, we know to attack, since either both generals are loyal or traitorous, and we have no choice. But if we receive conflicting reports, such as
attack from the commander and
he said retreat from the lieutenant, we can’t know who is telling the truth, since as illustrated, the two scenarios appear the same to us. The problem is we have no way of knowing what the commander actually said to the lieutenant.
This is not a rigorous proof, and the authors point readers to Reaching Agreement in the Presence of Faults for a better overview.
Extrapolation to 3
Using this result from their other paper, the authors extend the result to show that if there are
m traitors, there must be at least
3m + 1 total generals in the system to satisfy IC1 and IC2. If the traitors make up one third or more of the system, it’s impossible to satisfy these conditions.
In order to prove this, they use contradiction. Assume there is a solution to the Byzantine Generals Problem for a total of
3m generals, and
m traitors. With this knowledge, we can construct a reduction: assume there are
3m Albanian generals and
m Albanian traitors within the
If we add 3 Byzantine generals into the mix and split the Albanian generals into groups of 3 (of size
m). One of these groups is the group of traitors. We can map each group to a Byzantine general. Thus, one Byzantine general follows the traitors’ pattern. We know that 3 Byzantine generals is impossible, so we arrive at a proof by contradiction. (In the paper, this proof is quite hand-wavy and I don’t fully understand it).
In the end, we come up with the famous result: a system of
n nodes can only handle
m failures, where
m = (n - 1) / 3. Or,
n = 3m + 1.
A Solution For
3m + 1
Given 3 assumptions:
A1. Every message that is sent is delivered correctly.
A2. The receiver of a messages knows who sent it.
A3. The absence of a message can be detected.
We wish to obtain goal IC2 and IC2. Note we are only concerned with each general receiving the correct information from the commander, and we are one step away from the Original Problem. This solution details how each general agrees upon a value sent by the commander. In order to transpose this solution to the Original Problem, each general would act as the commander to send his value to the others. Ideally then each general has an algorithm to agree upon a plan of action.
It turns out out that the paper does not provide a concise description of the solution to the problem (as noted here), so I’ll be splitting this into a second post, using the procedure and proof detailed in Reaching Agreement in the Presence of Faults.