Snapshot too old in YugabyteDB

Franck Pachot - Apr 24 - - Dev Community

SQL databases store the current state and enough information to read about a previous state with Multi-Version Concurrency Control. Keeping all change records would not be scalable, so we only keep enough history for the oldest ongoing transaction. There are two possibilities when long ongoing transactions are running:

  • Let history grow. During VACUUM, PostgreSQL keeps old records that are more recent than the database transaction horizon, allowing bloat to persist.
  • Fail long transactions after a time limit, like Oracle undo_retention or YugabyteDB timestamp_history_retention_interval_sec

The second solution avoids runaway queries causing problems with other transactions. A growing MVCC history can impact performance and operations, affecting reads, memory, storage, and backups. Therefore, it's best to prevent MVCC history from growing uncontrollably. However, such an error must give enough information to understand the reason and fix the runaway query, and you should be able to understand it.

An example

I start YugabyteDB in a Docker container:

-bash-4.2# docker run --rm -it yugabytedb/yugabyte bash

[root@066127d97d21 yugabyte]# 

[root@066127d97d21 yugabyte]# yugabyted start

Starting yugabyted...
✅ YugabyteDB Started
✅ UI ready
✅ Data placement constraint successfully verified
...
+---------------------------------------------------------------------------------------------------------+
|                                                yugabyted                                                |
+---------------------------------------------------------------------------------------------------------+
| Status              : Running.                                                                          |
| Replication Factor  : 1                                                                                 |
| YugabyteDB UI       : http://172.17.0.3:15433                                                           |
| JDBC                : jdbc:postgresql://172.17.0.3:5433/yugabyte?user=yugabyte&password=yugabyte        |
| YSQL                : bin/ysqlsh -h 172.17.0.3  -U yugabyte -d yugabyte                                 |
| YCQL                : bin/ycqlsh 172.17.0.3 9042 -u cassandra                                           |
| Data Dir            : /root/var/data                                                                    |
| Log Dir             : /root/var/logs                                                                    |
| Universe UUID       : 640b90ac-c720-418a-bfd4-03f3eb106bab                                              |
+---------------------------------------------------------------------------------------------------------+
...

[root@066127d97d21 yugabyte]#

[root@066127d97d21 yugabyte]# ysqlsh -h $(hostname)
ysqlsh (11.2-YB-2.21.0.0-b0)
Type "help" for help.

yugabyte=#

Enter fullscreen mode Exit fullscreen mode

I create a demo table with one row:

yugabyte=# create table demo
 ( id bigserial primary key , value text )
;
CREATE TABLE

yugabyte=# insert into demo(value) 
 select 'Hello World' from generate_series(1,1)
;
INSERT 0 1

Enter fullscreen mode Exit fullscreen mode

I start a transaction, read from my table, and also show some time information, in Epoch, about the transaction time and the current time:

yugabyte=# begin transaction isolation level repeatable read
;
BEGIN

yugabyte=# select * from demo
;
 id | value
----+-------------
  1 | Hello World
(1 row)

yugabyte=# select 
   current_setting('yb_effective_transaction_isolation_level') 
 , extract(epoch from transaction_timestamp()) as tx 
 , extract(epoch from clock_timestamp()) as clock 
;
 current_setting |        tx        |      clock
-----------------+------------------+------------------
 repeatable read | 1713905945.54437 | 1713905960.63649
(1 row)
Enter fullscreen mode Exit fullscreen mode

Still in the transaction, I wait a few minutes and recheck the time. For this demo, I don't want to wait 15 minutes, which is the default, so I set it temporarily to 60 seconds (with timestamp_history_retention_interval_sec) and run a full compaction:

yugabyte=# select 
   current_setting('yb_effective_transaction_isolation_level') 
 , extract(epoch from transaction_timestamp()) as tx 
 , extract(epoch from clock_timestamp()) as clock 
;
 current_setting |        tx        |      clock
-----------------+------------------+------------------
 repeatable read | 1713905945.54437 | 1713906114.97461
(1 row)

yugabyte=# \! yb-ts-cli --server_address=$(hostname) set_flag --force timestamp_history_retention_interval_sec 60

yugabyte=# \! yb-ts-cli --server_address=$(hostname) compact_all_tablets

Successfully compacted all tablets

yugabyte=# \! yb-ts-cli --server_address=$(hostname) set_flag --force timestamp_history_retention_interval_sec 900

Enter fullscreen mode Exit fullscreen mode

I did not make any updates, but the database doesn't know. Had I made any modifications, the versions from before the compaction time minus 60 seconds would have been deleted. The database cannot guarantee a consistent read since my transaction's read time is from before that time. The snapshot required for the reading is too old.

yugabyte=# select 
   current_setting('yb_effective_transaction_isolation_level') 
 , extract(epoch from transaction_timestamp()) as tx 
 , extract(epoch from clock_timestamp()) as clock 
;
 current_setting |        tx        |      clock
-----------------+------------------+------------------
 repeatable read | 1713905945.54437 | 1713906169.19518
(1 row)

yugabyte=# select * from demo;
ERROR:  Snapshot too old. Read point: { physical: 1713905951675923 }, earliest read time allowed: { physical: 1713906057639729 }, delta (usec): 105963806: kSnapshotTooOld

Enter fullscreen mode Exit fullscreen mode

The Snapshot too old message gives essential information: the read point (the start of my transaction as I'm at a Repeatable Read isolation level) and the earliest time allowed (the retention time when compaction occurred). The delta is the difference between the two and gives an idea of the minimum retention that would have been required to run this query.

Here is the timeline with the numbers above:

1713905945.54437  Transaction time
1713905951.675923 Read point (snapshot)  <-----------------+ Snapshot
1713905960.63649  Clock at transaction start               | Too Old
1713906057.639729 Earliest read time allowed    <-----+    | 
1713906114.97461  Clock before compaction             |    |   
        (timestamp_history_retention_interval_sec) ---+    |        
1713906169.19518  Clock before SELECT error                |
                           (transaction read time) --------+ 
Enter fullscreen mode Exit fullscreen mode

When you encounter the "Snapshot Too Old" error, you should check if the duration of the statement/transaction is expected and then increase the MVCC retention. If it is the query that has a problem, you should fix it first.

yb_read_time

Note that in the latest version, you can set the read time yourself with yb_read_time. It is similar to Oracle's flashback query. Be careful, and it should be used cautiously for specific use cases, like recovery from errors.

Here is an example with my table:

yugabyte=# select * from demo;
 id |    value
----+-------------
  1 | Hello World
Enter fullscreen mode Exit fullscreen mode

I get the current time as an Epoch (in microseconds):

yugabyte=# select (1000000*extract(epoch from now()))::bigint now;
\gset
       now
------------------
 1713965275094928
(1 row)

yugabyte=# \gset
Enter fullscreen mode Exit fullscreen mode

I use this variable to set the read time for this flashback query

yugabyte=# set yb_read_time=:now;
NOTICE:  00000: yb_read_time should be set with caution.
DETAIL:  No DDL operations should be performed while it is set and it should not be set to a timestamp before a
DDL operation has been performed. It does not have well defined semantics for normal transactions and is only to
be used after consultation
SET
Enter fullscreen mode Exit fullscreen mode

A warning says it does not follow the SQL transaction semantics. That is because we cannot apply the current transaction changes to a past state.

I insert a new row and query my table:

yugabyte=# insert into demo(value)
            select 'Hello Again' from generate_series(1,1)
;
INSERT 0 1

yugabyte=# select * from demo;
 id |    value
----+-------------
  1 | Hello World
(1 row)
Enter fullscreen mode Exit fullscreen mode

The new row is invisible because my read point is before the insert.

I can see my row if I revert to the standard SQL behavior where the read time is the beginning of the transaction or statement:

yugabyte=# set yb_read_time=0;

NOTICE:  00000: yb_read_time should be set with caution.
DETAIL:  No DDL operations should be performed while it is set and it should not be set to a timestamp before a
DDL operation has been performed. It does not have well defined semantics for normal transactions and is only to
be used after consultation
SET

yugabyte=# select * from demo;

 id |    value
----+-------------
  2 | Hello Again
  1 | Hello World
(2 rows)
Enter fullscreen mode Exit fullscreen mode

The warning also says that it should not be set to before DDL. If I go too far in time, there are no rows displayed because the table didn't exist at that time (but the query is still parsed with the current catalog):

yugabyte=# set yb_read_time=1713879482182794;

NOTICE:  00000: yb_read_time should be set with caution.
DETAIL:  No DDL operations should be performed while it is set and it should not be set to a timestamp before a DDL operation has been performed. It does not have well defined semantics for normal transactions and is only to be used after consultation
SET

yugabyte=# select * from demo;
 id | value
----+-------
(0 rows)
Enter fullscreen mode Exit fullscreen mode

I'll let you try a date in the future. It will show the current version because the database is unaware of future changes.

To summarize

"Snapshot Too Old" error is expected for queries that run more than the MVCC retention. It defaults to 15 minutes, but can be increased with timestamp_history_retention_interval_sec. Note that this retention value can be internally increased to support database snapshots, to allow for Point In Time Recovery to any time between two snapshots. Additionally, the read point can be set to query as-of a past point-in-time. This can be used to recover from errors, in the current database or a clone.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .