Azure Cosmos DB offers multiple levels of data consistency but before going further and exploring these levels, let's review what consistency means in the context distributed NoSQL databases.
In a distributed environment, consistency describes the uniformity of data that is replicated across multiple nodes (sometimes physically separated by thousands of kilometers) for redundancy. In this case, consistency becomes a challenge due to factors such as network latency, node failures, and concurrent updates.
Consistency models in distributed NoSQL databases define clear rules and guarantees regarding how data is read and written across multiple replicas or partitions. These models aim to strike a balance between providing strong guarantees of data consistency and availability and maintaining system performance and scalability.
Levels
Let's take an example of a student database and suppose that we are in Europe and we will update the email address of a student. After the update, someone from North America queries that specific data. What value is he going to see?
Eventual Consistency
Eventual consistency is the weakest form of consistency because a client may read the values that are older than the ones it read in the past. Eventual consistency is ideal where the application doesn't require any ordering guarantees. In our example, there are chances the person from North America will not yet see the new value. Over time, the value will get replicated out to all of the different nodes and replica sets and *eventually * the data will be consistent. Therefore, it is suitable for applications that do not require guaranteed ordering. For example, count of retweets, non-threaded comments, likes, etc.
Consistent Prefix
This level applies to a special case when we need to be sure that the updates are read in order. However, data is not always current. In our case, the person from NA might not see the latest version. The consistent prefix provides read consistency to a specific point in time. It is suitable for the models that can afford the lag but requires high availability with low latency.
Session Consistency
In Azure Cosmos DB, when the consistency level is set to Session it means that the reads will be consistent within a single session. This will guarantee that a client session will read its own writes. It is the default consistency level for Azure Cosmos DB databases and collections and it is suitable for e-commerce applications, social media apps, and applications that require persistent user connections. In our scenario, the person from North America queries the data within the same session, she will see the updated email address. This is because the session maintains consistency, ensuring that all reads within the same session reflect the effects of all previous writes within that session.
Bounded Staleness
This doesn't guarantee we will get the most recent version but it guarantees that we will get a fairly recent version. It trades delays for strong consistency. We can specify maximum lag (time) or maximum lag (operations). In our scenario, they may see the updated email address, provided that the staleness window has not been exceeded. If the staleness window hasn't elapsed since the update, the query from North America will reflect the most recent consistent state of the data, including the updated email address. However, if the staleness window has passed since the update, and the lag in replication from Europe to North America exceeds this staleness window, the query from North America might not reflect the most recent update. Instead, it may reflect a state of the data that is consistent within the configured staleness window, but not necessarily including the most recent update from Europe.
Strong Consistency
This level guarantees reads to the most recent committed version of the item. The strong consistency is suitable for applications that cannot tolerate any data loss due to downtime. In our scenario, if we update the email address of a student in Europe and then someone from North America queries the data with a consistency level set to Strong, they will definitely see the updated email address. This is because this consistency level ensures that reads always reflect the most recent write, regardless of the location of the reader or the writer. Therefore, the user in North America querying the data with Strong consistency will see the updated email address of the student, regardless of where the update was made.
The consistency levels in Azure Cosmos DB determine how quickly we can read data that has been written or inserted into the database. In other words, when we write data to Cosmos DB, the consistency level determines how long it will take for that data to be readable by other operations while providing a trade-off between consistency and performance.
Photo by Shubham's Web3 on Unsplash
P.S.
I would love hearing from our tech-savvy community! Have insights, tips, or burning questions? Don't keep them to yourself! Drop your thoughts in the comments below. Let's spark a conversation and learn from each other's experiences in the dynamic world of Azure migration