Can writes be blocked by reads in YugabyteDB?

Franck Pachot - Oct 29 '23 - - Dev Community

TL;DR: No, except when explicitly needed by the business transaction. In this case, you have the choice between Serializable isolation level or explicit locking at lower levels.


MVCC (Multi-Version Concurrency Control) databases offer an important advantage - reads do not block writes. However, I was reading about Global Tables implementation in CockroachDB (a similar case that we solved with Duplicate Covering Indexes in YugabyteDB), and I was surprised to learn that they need to implement a different write behavior to achieve non-blocking transactions. One limitation of CockroachDB is that it only supports one isolation level, Serializable, with readers blocking writes. In contrast, YugabyteDB supports all SQL isolation levels because it is PostgreSQL-compatible, and most SQL applications are built for Read Committed.

An example

It's time to explain the YugabyteDB isolation levels in the context of read-write conflicts. I've started a cluster with read committed and wait queues enabled:

$ curl -s http://tserver:9000/varz?raw | grep -E "wait_queue|read_committed"
--enable_wait_queues=true
--yb_enable_read_committed_isolation=true
--TEST_sleep_before_entering_wait_queue_ms=0
--wait_queue_poll_interval_ms=100
Enter fullscreen mode Exit fullscreen mode

By default, a transparent retry or a serializable error is raised on conflicts. However, in this demo, I enable wait queues and will wait for a short duration of 3 seconds to check if I am in a blocking situation or not.

I'll be using a simple table that consists of two rows and two columns:

yugabyte=# create table demo ( id bigint primary key, col1 int , col2 int );
CREATE TABLE
yugabyte=# insert into demo(id,col1,col2) values (1,1,1),(2,2,2);
INSERT 0 2
yugabyte=# select * from demo;
 id | col1 | col2
----+------+------
  2 |    2 |    2
  1 |    1 |    1
(2 rows)
Enter fullscreen mode Exit fullscreen mode

I'll explain read-write conflicts later, as those involve isolation levels. However, to begin, I'll show some write-write conflicts.

Write-Write conflicts

In one session, I begin a transaction and update a column. While still in this transaction, I start another one (running another psql with \!) where I update the same column in autocommit mode:

yugabyte=# begin transaction isolation level read committed;
BEGIN

yugabyte=*# update demo set col1=col1+10 where id=1;
UPDATE 1

-- in another session:
yugabyte=*# \!  psql 
yugabyte=# set statement_timeout=3000;
SET
yugabyte=# update demo set col1=col1+100 where id=1;
ERROR:  Perform RPC (request call id 307) to 10.0.0.39:9100 timed out after 5.000s
Enter fullscreen mode Exit fullscreen mode

We must wait for other transactions to be completed before writing to the same row and column. The statement timeout was pushed down to the distributed storage to raise an RPC timeout.

In the PostgreSQL database, if we write to different rows, the issue mentioned in the previous statement does not occur because locks are at row level. It is the same for YugabyteDB:

yugabyte=# begin transaction isolation level read committed;
BEGIN

yugabyte=*# update demo set col1=col1+10 where id=2;
UPDATE 1

-- in another session:
yugabyte=*# \!  psql 
yugabyte=# set statement_timeout=3000;
SET
yugabyte=# update demo set col1=col1+100 where id=1;
UPDATE 1

-- the second session was in auto-commit
yugabyte=# quit;
-- back to the first session

yugabyte=*# commit;
COMMIT
yugabyte=# select * from demo;
 id | col1 | col2
----+------+------
  2 |   12 |    2
  1 |  101 |    1
(2 rows)
Enter fullscreen mode Exit fullscreen mode

YugabyteDB has a finer granularity for locks at the column level. There's no conflict when writing to the same row but different columns:

yugabyte=# begin transaction isolation level read committed;
BEGIN

yugabyte=*# update demo set col2=col2+10 where id=1;
UPDATE 1

-- in another session:
yugabyte=*# \!  psql 
yugabyte=# set statement_timeout=3000;
SET
yugabyte=# update demo set col1=col1+100 where id=1;
UPDATE 1

-- the second session was in auto-commit
yugabyte=# quit;
-- back to the first session

yugabyte=*# commit;
COMMIT
yugabyte=# select * from demo;
 id | col1 | col2
----+------+------
  2 |    2 |    2
  1 |  101 |   11
(2 rows)
Enter fullscreen mode Exit fullscreen mode

This feature is unique to YugabyteDB and allows for higher concurrency than PostgreSQL. Vlad Mihalcea has written a blog post on this topic:

YugabyteDB column-level locking - Vlad Mihalcea

Learn how YugabyteDB can scale writes by employing a column-level locking approach, as opposed to RDBMS row-level locking.

favicon vladmihalcea.com

Those examples involved write-write conflicts that occur at any isolation level. Knowing that the conflicts happen for the same row and column, I will test for read-write conflicts where one session is reading (SELECT) and the update is writing (UPDATE).

Read-Write conflict in Serializable

When ensuring serializable transactions, detecting read-write conflicts is necessary to avoid anomalies. This is not the default in PostgreSQL nor YugabyteDB:

yugabyte=# begin transaction isolation level serializable;
BEGIN

yugabyte=*# select col1 from demo where id=1;

 col1
------
    1
(1 row)

-- in another session:
yugabyte=*# \!  psql 
yugabyte=# set statement_timeout=3000;
SET
yugabyte=# update demo set col1=col1+100 where id=1;
ERROR:  Perform RPC (request call id 66) to 10.0.0.39:9100 timed out after 5.000s
Enter fullscreen mode Exit fullscreen mode

In the Serializable isolation level, a read can block a concurrent write, which means it has to either wait or fail. This must happen when your transaction relies on the state you have read to update different rows. If the state changes within a transaction, it is no longer serializable. Therefore, this is the only scenario where a read can block a write.

In YugabyteDB, just like in PostgreSQL, there are other isolation levels available. This allows for better concurrency when transactions don't depend on the exact state that was read.

Read-Write conflict in Repeatable Read

Snapshot Isolation in MVCC databases ensures read consistency without blocking, and that's sufficient for repeatable reads:

yugabyte=# begin transaction isolation level repeatable read;
BEGIN

yugabyte=*# select col1 from demo where id=1;

 col1
------
    1
(1 row)

-- in another session:
yugabyte=*# \!  psql 
yugabyte=# set statement_timeout=3000;
SET
yugabyte=# update demo set col1=col1+100 where id=1;
UPDATE 1

-- the second session was in auto-commit
yugabyte=# quit;
-- back to the first session

yugabyte=*# select * from demo;
 id | col1 | col2
----+------+------
  2 |    2 |    2
  1 |    1 |    1
(2 rows)

yugabyte=*# commit;
COMMIT

yugabyte=# select * from demo;
 id | col1 | col2
----+------+------
  2 |    2 |    2
  1 |  101 |    1
(2 rows)
Enter fullscreen mode Exit fullscreen mode

The first transaction ignores concurrent changes (but may block if it tries to update those rows because of the write-write conflict). The read operation itself doesn't block other sessions.

Read-Write conflict with SELECT FOR SHARE

Of course, even if you don't run in the Serializable mode, you can still perform blocking reads for the few transactions that require it with explicit locking, like SELECT FOR SHARE:

yugabyte=# begin transaction isolation level repeatable read;
BEGIN

yugabyte=*# select col1 from demo where id=1 for share;

 col1
------
    1
(1 row)

-- in another session:
yugabyte=*# \!  psql 
yugabyte=# set statement_timeout=3000;
SET
yugabyte=# update demo set col1=col1+100 where id=1;
ERROR:  Perform RPC (request call id 60) to 10.0.0.39:9100 timed out after 5.000s
Enter fullscreen mode Exit fullscreen mode

One way to prevent anomalies with isolation levels lower than serializable is to use explicit locking, a per-transaction decision made by the developer. This is what most SQL applications are using.

Read-Write conflict in Read Committed

Although Serializable and Repeatable Read isolation levels are designed to prevent conflicts when multiple transactions access the same data concurrently, they may still raise serializable errors, which can cause issues for the application. Therefore, the application must implement a retry logic to handle such errors.

In contrast, Read Committed isolation level uses MVCC snapshots at the statement level rather than the transaction level. This approach allows YugabyteDB to automatically rollback and restart the statement at a newer read time, so that the application doesn't have to handle the retry logic manually.

Here is the same read-write conflict in Read Committed, where readers do not block writers:

yugabyte=# begin transaction isolation level read committed;
BEGIN

yugabyte=*# select col1 from demo where id=1;

 col1
------
    1
(1 row)

-- in another session:
yugabyte=*# \!  psql 
yugabyte=# set statement_timeout=3000;
SET
yugabyte=# update demo set col1=col1+100 where id=1;
UPDATE 1

-- the second session was in auto-commit
yugabyte=# quit;
-- back to the first session

yugabyte=*# select * from demo;
 id | col1 | col2
----+------+------
  2 |    2 |    2
  1 |  101 |    1
(2 rows)

yugabyte=*# commit;
COMMIT

yugabyte=# select * from demo;
 id | col1 | col2
----+------+------
  2 |    2 |    2
  1 |  101 |    1
(2 rows)

Enter fullscreen mode Exit fullscreen mode

When a transaction re-reads rows, the difference between Read Committed and other isolation levels becomes apparent. In Read Committed, the changes committed by another transaction are visible to the first transaction. This is because the read point of MVCC snapshot differs for each statement within the same transaction. However, you can avoid non-repeatable reads with implicit locking (SELECT FOR SHARE), which blocks writers while readers access the data.

To Summarize

YugabyteDB writers (UPDATE) wait only in these cases:

  • another transaction is updating the same row and same column
  • another transaction has read the same row in Serializable
  • another transaction has locked the same row with SELECT for SHARE/UPDATE

For OLTP applications, providing non-blocking transactions is essential. Readers must not block writers, except when explicitly required by the business transaction. This approach has been the key to the success of Oracle Database, as compared to other database systems like DB2 or SQL Server, which implemented isolation levels with read locks, rather than MVCC Snapshot Isolation. This feature becomes even more important for Distributed SQL, where scaling OLTP is one of the major benefits.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .