CAP Theorem

In the previous section, we did give some very important aspects such as eventual consistency (the next section dives deep into this) and other aspects in brief. Before explaining this aspect in more detail, it is apt to explain a very important theorem called CAP theorem.

CAP (Consistency, Availability, Partition Tolerance) Theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed computer system to simultaneously provide all three (Consistency, Availability, Partition Tolerance) guarantees.

- Wikipedia

Out of three guarantees, a distributed system can only have one of C (Consistency) or A (Availability) when the distributed data is partitioned. A distributed system is bound to have network failures, and in this case, network partitioning would have to be tolerated. Let's detail these three important aspects in a concise manner in this table:

The Data Lake using Lambda Architecture works with this theorem in a context. Usually in such a context, Availability is chosen as against Consistency. Because of this aspect, consistency of data would be achieved eventually, and more often than not, data goes with approximations. This is known as eventual consistency. We will go into a bit more detail of this aspect in the following section:

Figure 08: CAP theorem

The traditional ACID (Atomicity, Consistency, Isolation, Durability) based stores such as RDBMSes choose Consistency over Availability. NoSQL-based stores, which are more common for Data Lake because of their features, choose the BASE (Basically Available, Soft state, Eventual consistency) philosophy and choose Availability over Consistency.