Distributed systems
Understanding distributed systems is essential to our understanding blockchain, as blockchain was a distributed system at its core. It is a distributed ledger that can be centralized or decentralized. A blockchain is originally intended to be and is usually used as a decentralized platform. It can be thought of as a system that has properties of the both decentralized and distributed paradigms. It is a decentralized-distributed system.
Distributed systems are a computing paradigm whereby two or more nodes work with each other in a coordinated fashion to achieve a common outcome. It is modeled in such a way that end users see it as a single logical platform. For example, Google's search engine is based on a large distributed system; however, to a user, it looks like a single, coherent platform.
A node can be defined as an individual player in a distributed system. All nodes are capable of sending and receiving messages to and from each other. Nodes can be honest, faulty, or malicious, and they have memory and a processor. A node that exhibits irrational behavior is also known as a Byzantine node after the Byzantine Generals problem.
The Byzantine Generals problem
In 1982, a thought experiment was proposed by Lamport et al. in their research paper, The Byzantine Generals Problem, which is available here:
https://www.microsoft.com/en-us/research/publication/byzantine-generals-problem/
In this problem, a group of army generals who lead different parts of the Byzantine army is planning to attack or retreat from a city. The only way of communicating among them is via a messenger. They need to agree to strike at the same time in order to win. The issue is that one or more generals might be traitors who could send a misleading message. Therefore, there is a need for a viable mechanism that allows for agreement among the generals, even in the presence of the treacherous ones, so that the attack can still take place at the same time. As an analogy for distributed systems, the generals can be considered honest nodes, the traitors as Byzantine nodes (that is, nodes with arbitrary behavior), and the messenger can be thought of as a channel of communication among the generals.
This problem was solved in 1999 by Castro and Liskov who presented the Practical Byzantine Fault Tolerance (PBFT) algorithm, which solves the consensus problem in the presence of Byzantine faults in asynchronous networks by utilizing the state machine replication protocol. PBFT goes through a number of rounds to eventually reach an agreement between nodes on the proposed value. PBFT and other consensus protocols will be discussed in greater detail in Chapter 5, Consensus Algorithms.
This type of inconsistent behavior of Byzantine nodes can be intentionally malicious, which is detrimental to the operation of the network. Any unexpected behavior by a node on the network, whether malicious or not, can be categorized as Byzantine.
A small-scale example of a distributed system is shown in the following diagram. This distributed system has six nodes, out of which one (N4) is a Byzantine node leading to possible data inconsistency. L2 is a link that is broken or slow, and this can lead to a partition in the network:
Figure 1.3: Design of a distributed system: N4 is a Byzantine node and L2 is broken or a slow network link
The primary challenge of a distributed system design is the coordination between nodes and fault tolerance. Even if some (a certain threshold dictated by the consensus protocol) of the nodes become faulty or network links break, the distributed system should be able to tolerate this and continue to work to achieve the desired result. This problem has been an active area of distributed system design research for many years, and several algorithms and mechanisms have been proposed to overcome these issues.
Distributed systems are so challenging to design that a theory known as the CAP theorem has been proven, which states that a distributed system cannot have all three of the much-desired properties simultaneously; that is, consistency, availability, and partition tolerance. We will dive into the CAP theorem in more detail later in this chapter.
Even though blockchain can be considered to be both a distributed and decentralized system, there are, however, critical differences between distributed systems and decentralized systems that make both of these systems architecturally different. We will discuss these differences in detail in Chapter 2, Decentralization.
With a better understanding of distributed systems, let's now move on to talking about blockchain itself. We'll begin with a brief rundown of the history of blockchain and Bitcoin.