Consistent reads go to the tablet leader to work on the current state, being guaranteed to see the latest writes, maybe waiting for concurrent writes to be committed.
In a geo-distributed deployment, where network latency can be higher than the expected response time, you may prefer to read from the nearest replica, leader or follower, to get faster response time.
When follower reads are allowed, YugabyteDB still reads from a consistent snapshot, but accepting a bounded staleness defined by yb_follower_read_staleness_ms
which defaults to 30 seconds. All replicas are guaranteed to be consistent with this snapshot, even without reading from other followers, for two reasons: they exchange heartbeats (by default every 500ms), and the maximum clock skew is known (by default 500ms).
A staleness of 15 or 30 seconds is often acceptable for real-time analytics or reporting where you query a virtual snapshot without making any some reservations (locks).
This behavior is enabled in YugabyteDB by two settings:
- the transaction must be declared as read-only, so that it cannot compromise write consistency. Basically, in an MVCC database, "read-only" is an isolation level that doesn't require any lock acquisition or read intents record.
-
yb_read_from_followers
is set toon
to explicitly allow the possible staleness because that is different from the monolithic PostgreSQL behavior of read-only transactions (which depends on the replication mode and settings). It explicitly allow a staleness ofyb_follower_read_staleness_ms
before the beginning of the query.
Dedicated connection pool
To set those, it is recommended to deploy with a dedicated connection pool with those initialization settings. Here are some examples:
- SET on connection initialization declared in the connection pool configuration:
set yb_read_from_followers to on;
set default_transaction_read_only to on;
- PGOPTION as environnement variable:
PGOPTIONS="-c yb_read_from_followers=on -c default_transaction_read_only=on"
- PGOPTION in the JDBC URL:
jdbc:postgresql://localhost:5433/yugabyte?options=-c%20yb_read_from_followers=on%20-c%20default_transaction_read_only=on
- dedicate a user for it:
alter user reader_role set yb_read_from_followers to on;
alter user reader_role set default_transaction_read_only to on;
Typically you need at least two connection pools in a CQRS architecture: one for the consistent reads and writes, and one for the queries.
Setting per transaction when sharing the same connection
Multiple connection pools will increase the number of connections and it is also a good practice to limit the number of connections and, in some cases, you want to allow follower reads for a few specific queries only.
Setting those two parameters before the query, and back to their initial value after it, may require additional roundtrips to the server. It is also not easy to maintain: forgetting to set it back, in the normal scenario or after an exception, would be a critical issue.
This blog post explores some way to set those parameters per query without increasing the number of roundtrips.
How to test it
Before trying some alternatives, I need a reliable way to test it. I create a simple table:
create table demo as select generate_series(1,1000) x;
and use my YBWR script:
\! curl -s https://raw.githubusercontent.com/FranckPachot/ybdemo/main/docker/yb-lab/client/ybwr.sql | grep -v '\watch' > ybwr.sql
\i ybwr.sql
This defines two prepared statements, snap_reset
and snap_table
to display the read statistics on the tablet (seek()
and next()
in the LSM-Tree).
First, I set yb_read_from_followers=on
without setting a read-only transaction. This reads from the leaders (L
) which are distributed to multiple nodes (10.0.0.39
, 10.0.0.40
, 10.0.0.41
):
yugabyte=# execute snap_reset;
ybwr metrics
--------------
(0 rows)
Time: 141.791 ms
yugabyte=# set yb_read_from_followers=on;
SET
Time: 12.080 ms
yugabyte=# select count(*) from demo where x between 1 and 42 ;
count
-------
42
(1 row)
Time: 13.412 ms
yugabyte=# execute snap_table;
rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------------------
1 | 667 | | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458 L 10.0.0.41
1 | 645 | | yugabyte demo 49b822c41fe34c059e0e24788810012a L 10.0.0.40
1 | 685 | | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39
(3 rows)
Time: 450.742 ms
This has read from different nodes, where the table Leader (L
) is. In this case (Sequential Scan on few rows) there's one seek()
per node, which means that the response time will include the latency to each node, once per execution.
Now, if I do the same in a read-only transaction (with yb_read_from_followers
stills set to on
):
yugabyte=# execute snap_reset;
ybwr metrics
--------------
(0 rows)
Time: 141.791 ms
yugabyte=# set default_transaction_read_only=on;
SET
Time: 12.080 ms
yugabyte=# select count(*) from demo where x between 1 and 42 ;
count
-------
42
(1 row)
Time: 13.412 ms
yugabyte=# set default_transaction_read_only=off;
SET
Time: 12.069 ms
rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------------------
1 | 667 | | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458 10.0.0.39
1 | 645 | | yugabyte demo 49b822c41fe34c059e0e24788810012a 10.0.0.39
1 | 685 | | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39
(3 rows)
Time: 450.742 ms
In this 3 nodes cluster with Replication Factor 3, there's a replica on each node. I have read from the Leader only when it was near the node I'm connected to (10.0.0.39
) and from the followers for the other, still reading from my local node. This validates the read from follower behavior.
I can also test the consequence, that with those settings, a read-only transaction reads from a virtual snapshot taken from 30 seconds ago:
yugabyte=# \dconfig yb*follower*
List of configuration parameters
Parameter | Value
-------------------------------+-------
yb_follower_read_staleness_ms | 30000
yb_read_from_followers | on
(2 rows)
yugabyte=# select count(*) from demo;
count
-------
1000
(1 row)
yugabyte=# insert into demo values(0);
INSERT 0 1
yugabyte=# select count(*) from demo;
count
-------
1001
(1 row)
yugabyte=# set default_transaction_read_only=on;
SET
yugabyte=# select count(*) from demo;
count
-------
1000
(1 row)
yugabyte=# \! sleep 30
yugabyte=# select count(*) from demo;
count
-------
1001
(1 row)
yugabyte=# set default_transaction_read_only=off;
SET
yugabyte=#
Now that I know how to test, I can show the different ways to set those parameters for a single query.
⚠️ the /*+ Set() */ hint is not a solution
I'll start with a way that may seem to work in some situation but only because of some side effects when setting the read point. YugabyteDB installs the pg_hint_plan
which allows a Set()
hint to set a parameter for one query. However, this is set only during the parsing phase, and is relevant only for query planner parameters. Reading from followers is a behavior of the Executor and is not controlled by pg_hint_plan
.
Doing so was used in some situation, and even got its way into some documentation examples, by mistake. It was working as a side effect only, because the read time is pro-actively set during the parse phase and, in some limited cases (atomic transaction, no prepared statement, no generic plan) doesn't change at execution time.
Basically there is no way in PostgreSQL to set a parameter for the scope of a query only and I'm not aware of any extension doing this. There was a proposal in 2011 (proposal: set GUC variables for single query) with some discussions about a possible syntax and some doubts about the value of it in addition to the transaction scope with SET LOCAL.
Explicit transaction with SET LOCAL
SET LOCAL is in theory the best solution but it requires an explicit transaction:
start transaction;
set local transaction_read_only to on;
set local yb_read_from_followers to on;
select count(*) from demo where x between 1 and 42;
commit;
I don't like this solution because I consider that read-only must be at transaction level. Would it make sense to set this parameter back and forth in the same transaction? No, and you would get ERROR: cannot set transaction read-write mode inside a read-only transaction
Explicit transaction with READ ONLY
If yb_read_from_followers=on
is already set, it is much better to define READ ONLY at the scope of the transaction:
start transaction read only;
select count(*) from demo where x between 1 and 42;
commit;
However, as your goal is a single-query, you don't want 3 roundtrips to the PostgreSQL backend to run this. It is not possible to use an autocommit statement (or you would have to set default_transaction_read_only
and then be sure to set it back after).
You can send the explicit transaction in a single SQL command if your driver is able to get the query result from there. this is possible with psql
or ysqlsh
but this requires that the ;
is escaped (\;
) because psql
interprets ;
as a command separator before sending it as a statement separator. The difference is visible when you set \timing on
as it shows the time for each command:
yugabyte=# \timing on
Timing is on.
yugabyte=# start transaction read only ;
START TRANSACTION
Time: 12.096 ms
yugabyte=*# select count(*) from demo where x between 1 and 42 ;
count
-------
42
(1 row)
Time: 13.410 ms
yugabyte=*# commit;
COMMIT
Time: 12.069 ms
yugabyte=#
yugabyte=#
yugabyte=# start transaction read only \;
yugabyte-# select count(*) from demo where x between 1 and 42 \;
yugabyte-# commit;
START TRANSACTION
count
-------
42
(1 row)
COMMIT
Time: 13.756 ms
You should test what happens with the drivers you use, For example do you know what JDBC sends when you setReadOnly(true)
or JPA and Spring with @Transactional(readOnly = true)
? How many calls to the database? Some client library will also allow to batch multi-statement commands so that you can send the start transaction
and the select
in one execution.
In pg_stat_statements
you will see multiple statements in both cases, because query
is a statement there, but in pg_stat_activity
you will see the whole command in query
.
ALTER FUNCTION ... SET
Another possibility to reduce the calls to the database is encapsulating your query as a function. There you can define parameters with the scope of the function execution:
create or replace function demo_query(a int, b int)
returns setof demo as $SQL$
select * from demo where x between a and b;
$SQL$ language sql
set transaction_read_only to on
-- cannot be set here (see https://github.com/yugabyte/yugabyte-db/issues/20482)
-- set yb_read_from_followers=on
;
This reads from followers in one atomic transaction if yb_read_from_followers
is set before calling it (see issue #20482):
yugabyte=# set yb_read_from_followers=on
SET
yugabyte=# select count(*) from demo_query(1, 42) ;
count
-------
42
(1 row)
Time: 16.308 ms
yugabyte=# execute snap_table;
rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------------------
1 | 669 | | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458 10.0.0.39
1 | 645 | | yugabyte demo 49b822c41fe34c059e0e24788810012a 10.0.0.39
1 | 685 | | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39
(3 rows)
Time: 454.406 ms
This works but you have to write a query with parameters. As far as I know there is no possibility to do the same with a view.
There is also no possibility to do the same with a DO block as it cannot return a result.
Other settings
I mentioned default_transaction_isolation
and transaction_read_only
but there are equivalent syntaxes:
within a session you can
set default_transaction_read_only=on
orset session characteristics as transaction read only
to set the transaction mode for the next transactions (it doesn't try to change the current one if you have already started one - and some drivers with auto commit of issue a begin before).within a transaction, you can
set transaction_read_only=on
orset transaction read only
. You don't need SET LOCAL as this is a characteristic of the current transaction and the next one will use the default, but I think it is good to make it explicit.
Be careful, some drivers disable Auto Commit by starting a transaction before the call. Then setting the default for the next transaction will not be applied for the next statements.
- my preference goes to
start transaction read only
as it clearly associate the mode with the transaction. It is the same asbegin read only
orbegin work read only
orbegin work transaction read only
. I often typebegin
as it is shorter but the SQL starndard isstart transaction
.
If you know a simpler way to set a parameter for a single query in PostgreSQL, please let me know.