You may have encountered this error when creating an index on a large YugabyteDB index. The reason is that, as many operations, there's a timeout which, by default one hour (set as a tserver flag --backfill_index_client_rpc_timeout_ms=3600000
). I'm writing this as of version 2.12.5 but there are good chances that we increase the default. Here is a quick test to show the behavior. I'm, starting a YugabyteDB database with all defaults:
docker run -d --name yb -p7000:7000 -p9000:9000 -p5433:5433 \
yugabytedb/yugabyte:2.12.5.0-b24 bin/yugabyted start \
--daemon=false
I create a small table but using a slow()
function that takes 1 second to execute, and create an index on this function:
docker exec -it yb ysqlsh -e -h localhost
\timing on
create table demo as select generate_series(1,100000) n;
create function slow(val int) returns int as $$ begin
perform pg_sleep(1);
return val;
end ; $$ immutable language plpgsql;
create index demoi on demo (slow(n));
This fails after an hour:
ERROR: Timed out: BackfillIndex RPC (request call id 429) to 0.0.0.0:9100 timed out after 3600.000s
Time: 3602273.955 ms (01:00:02.274)
You will see it as invalid and must drop it before trying again:
select tablename, indexname, indisvalid
from pg_index
natural join (
select oid indexrelid, relname indexname from pg_class
) i
natural join (
select oid indrelid, relname tablename from pg_class
) t where not indisvalid;
tablename | indexname | indisvalid
-----------+-----------+------------
demo | demoi | f
(1 row)
backfill_index_client_rpc_timeout_ms
If you want to test it, on a lab, you can set a very small value, like 10 seconds here:
docker rm -f yb
docker run -d --name yb -p7000:7000 -p9000:9000 -p5433:5433 \
yugabytedb/yugabyte:2.12.5.0-b24 bin/yugabyted start \
--tserver_flags="backfill_index_client_rpc_timeout_ms=10000" \
--daemon=false
The index creation fails after 10 seconds:
create index demoi on demo (slow(a));
ERROR: Timed out: BackfillIndex RPC (request call id 2600) to 0.0.0.0:9100 timed out after 10.000s
Time: 12264.869 ms (00:12.265)
increase the setting
The goal of backfill is to be able to create indexes fully online, concurrently with normal activity, and it is a good idea to increase this timeout to a larger value. For example backfill_index_client_rpc_timeout_ms=604800000
will set it to seven days.
If a CREATE INDEX is taking long, you will see it in pg_stat_activity
and pg_stat_statements
:
postgres=# select application_name, backend_start, xact_start, query_start, state_change, state, query from pg_stat_activity;
application_name | backend_start | xact_start | query_start | state_change | state | query
------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+--------+--------------------------------------------------------------------------------------------------------------------
ysqlsh | 2022-06-03 13:38:34.562643+00 | 2022-06-03 13:39:21.46686+00 | 2022-06-03 13:39:21.46686+00 | 2022-06-03 13:39:21.4669+00 | active | create index demoi on demo (slow(n));
| 2022-06-03 14:03:29.545489+00 | 2022-06-03 14:03:29.579138+00 | 2022-06-03 14:03:29.579138+00 | 2022-06-03 14:03:29.579175+00 | active | BACKFILL INDEX 16388 WITH x'0880011a00' READ TIME 6775863557200748544 PARTITION x'';
| 2022-06-03 14:14:29.561329+00 | 2022-06-03 14:14:29.579093+00 | 2022-06-03 14:14:29.579093+00 | 2022-06-03 14:14:29.579152+00 | active | BACKFILL INDEX 16388 WITH x'0880011a00' READ TIME 6775863557200748544 PARTITION x'';
psql | 2022-06-03 14:18:31.204116+00 | 2022-06-03 14:20:09.833529+00 | 2022-06-03 14:20:09.833529+00 | 2022-06-03 14:20:09.833599+00 | active | select application_name, backend_start, xact_start, query_start, state_change, state, query from pg_stat_activity;
| 2022-06-03 13:38:26.754005+00 | | | | |
(5 rows)
postgres=# select calls, total_time, query from pg_stat_statements;
calls | total_time | query
-------+----------------+-------------------------------------------------------------------------------------------------------------------
5 | 5127796.745189 | BACKFILL INDEX 16388 WITH x'0880011a00' READ TIME 6775863557200748544 PARTITION x''
4 | 24.885011 | select * from pg_stat_activity
1 | 3963.828692 | create table demo as select generate_series(1,100000) n
1 | 29.585054 | create or replace function slow(val int) returns int as $$ begin +
| | perform pg_sleep(1); +
| | return val; +
| | end ; $$ immutable language plpgsql
1 | 0.087609 | select calls, total_time, query from pg_stat_statements
1 | 0.303569 | select * from pg_stat_statements
1 | 0.086394 | select calls, query from pg_stat_statements
4 | 0.243348 | select application_name, backend_start, xact_start, query_start, state_change, state, query from pg_stat_activity
(8 rows)
In addition to the DDL itself, you will find the BACKFILL INDEX internal operations, running concurrently on multiple partitions.
Note that never version 2.8 some other timeouts were also applied to the index backfill, but this is not the case anymore. The backfill_index_client_rpc_timeout_ms
is the only one, and its default will probably be set higher in future versions, but always good to have a look at it before creating indexes on large databases.