Why the words in “CAP theorem” are so confusing!

Consistency

The word “consistency” is extremely overloaded in the realm of distributed systems. In CAP, consistency means (informally) that every read reflects the most recent write. This is also known as single-copy consistency or strict / strong consistency. Data is replicated to multiple nodes in these systems, but any reads to this storage system should give the illusion that read and writes are actually going to the same node in a multi-user / concurrent environment. Put another way – there’s multiple copies of data in practice, but readers always see the latest (single) copy by some loose definition of “latest”.

Unfortunately, “strong” consistency is a grey area with lots of informal definitions. Some distributed systems researchers such as Martin Kleppman equate CAP consistency with linearlizability, which depending on your definition of strong consistency, is either a weaker form of strong consistency or a more precise and formal definition of strong consistency. Does consistency mean that every write is instantaneously reflected on all nodes? Or, in the case of linearlizability, that even though reads may not reflect the latest write they will always reflect the correct ordering of writes based on some agreed upon global time?

Data stores that support transactions are also characterized by their ACID properties (Atomic, Consistent, Isolated, Durable). ACID consistency has nothing to do with ensuring that readers get the latest write. It’s about the preservation of invariants in the data itself (such as keeping debit or credit balanced) within a transaction, which is often the responsibility of the application itself rather than the storage system. Relational databases support specific forms of invariant enforcement through referential integrity, but many invariants cannot be easily enforced by the storage system. Are they related at all? Well, they’re related insofar as CAP consistency trade-offs affect ACID consistency for system designers, but they’re still completely different notions of consistency! For example, if it’s possible for a read to return stale data during a transaction (dirty reads), this can make it difficult to maintain invariants for the application.

Availability

Availability in CAP has very strict, binary definition. CAP availability states that every non-failing node (even if isolated from the rest of the system) can respond successfully to requests. This is confusing because most site operators talk about availability in terms of uptime and percentages. For example, consider this scenario of a system that currently serves most reads from secondary nodes. Lets assume that most secondary nodes in a system fails and can no longer serve requests. The server switches all reads to the primary and is still technically “available” from a user standpoint. This fails the CAP definition of availability, but passes the more practical operational definition of availability. One way to distinguish one from the other when they’re discussed at the same time is to refer to CAP availability as “big A” and practical availability / uptime as “little a”.

In practice, availability (“little a”) isn’t black and white (can EVERY non-failing node respond?). Most real world systems don’t even offer this strong guarantee. This definition might make sense in a theoretical system and make it easier to reason with more formally / mathematically, but in reality many storage systems allow users to choose between many different levels of availability and for different operations. For example, some databases allow users to tune availability by reads or writes using quorums. Systems that prioritize data durability in the face of failures may sacrifice some availability for writes. If writes cannot be propagated to a majority of nodes in the system, return an error. At the same time, it may also prioritize availability for reads and so perhaps any response from any replica is considered successful, even if that data might be stale. If you’re trying to reason about the availability of a storage system, these details matter and are far more relevant than whether or not the system is capable of achieving big A (CAP availability).

Partition Tolerance

Partition tolerance is confusing on multiple levels. One one level, the phrase itself is confusing because “partitions” in a distributed system are splits or shards of a data store that is maintained for purposes of scalability. In CAP, a partition refers to some network partition (also a confusing phrase) which is essentially a network error where communication between node are interrupted. If a system remains operational in the presence of a network partition, then it is considered partition tolerant in the CAP model.

On another level, partition tolerance is confusing because networks are unreliable and system designers need to design systems to tolerate network errors. It’s not a real choice or trade-off that should be made – it’s more of a requirement. Unlike availability and consistency (which are actual trade-offs that need to be made when there is a network partition), partition tolerance is not something you can relax in practice. So designers are really choosing between consistency and availability, assuming that there will be network partitions.

Finally, partitions in the network are relatively rare events. Two data centers or nodes within a data center may lose communication, but that’s rare compared to node failures (disk full, over-capacity, random crashes) that happen all the time. In the model of CAP, however, a single node failure isn’t a network failure and so it falls outside the rules of CAP. Eric Brewer has clarified though that in practice, nodes that fail to respond within some time window can be treated as conceptually equivalent to an actual partition event. This does make CAP more relevant in more practical discussions but it doesn’t make it any less confusing.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *