MongoDB Majority Read Concern

One common misconception of mongos read concern: majority is that it’s reading from a majority of nodes. This is understandable because it’s counterpart write concern: majority requires acks from the majority of nodes. But that’s not at all what read concern does.

Reads always get submitted to a single node using a server selection process that takes into account your read preference (primary, primaryPreferred, secondary, etc). If you have a primary read preference, a read will always go to the primary. If you have a secondary read preference, a read will get submitted to a single secondary.

When the read concern is set to majority, that’s saying “only return data for this query that has been committed / successfully written to the majority of nodes”. This does not mean that you’re always reading the latest write. The node you’re reading from may not have the majority-committed version of the data you’re looking for. It may still reflect the previous value, instead of the latest value that perhaps has not yet propagated to the majority of nodes. What it does mean is that the data you’re reading has a high level of durability because in the event of a failure the value you’re reading is unlikely to be rolled back since it’s been majority committed.

The majority commit value for any write is determined by the primary during the standard replication process. When data gets replicated to a secondary node, it’ll check with the primary whether to update it’s “majority commit” snapshot of the data. If that value has not been majority committed, the majority commit snapshot will maintain its previous value. That’s why it’s still possible to read stale values with read: majority. The majority commit snapshot on any given node is only updated for a particular value when it is actually successfully replicated from the primary to the majority of its secondary nodes.

Read your own writes

Reading your own writes is a special case of causal consistency and having a majority read concern is a key component of read your own write consistency in Mongo.

To achieve reading your own writes, you need to ensure the following settings:

  1. Operations are done inside a session with causal consistency enabled
  2. Write concern is majority
  3. Read concern is majority

Why do you need to set specific read and write concerns even though causal consistency is enabled? I’m not sure, but it’s an extremely confusing and misleading API. Causal consistency theoretically includes read your own writes consistency, but in MongoDB enabling causal consistency is not sufficient for reading your own writes!

If you do have these settings on, MongoDB will track operations with a global logical clock and your reads will block until it’s able to read the most recent majority committed write from the same session. Without causal consistency enabled, a write may go to a majority of nodes but the read may still would up returning non-majority-committed data from a node that does not have the write that just happened in the same session. The causal consistency session is what causes reads to block if it attempts to read a stale write.

Leave a Reply

Your email address will not be published. Required fields are marked *