ERROR: Timed out: BackfillIndex RPC

Franck Pachot - Jun 3 '22 - - Dev Community

You may have encountered this error when creating an index on a large YugabyteDB index. The reason is that, as many operations, there's a timeout which, by default one hour (set as a tserver flag --backfill_index_client_rpc_timeout_ms=3600000). I'm writing this as of version 2.12.5 but there are good chances that we increase the default. Here is a quick test to show the behavior. I'm, starting a YugabyteDB database with all defaults:

docker run -d --name yb -p7000:7000 -p9000:9000 -p5433:5433 \
yugabytedb/yugabyte:2.12.5.0-b24 bin/yugabyted start \
--daemon=false 
Enter fullscreen mode Exit fullscreen mode

I create a small table but using a slow() function that takes 1 second to execute, and create an index on this function:

docker exec -it yb ysqlsh -e -h localhost
\timing on
create table demo as select generate_series(1,100000) n;
create function slow(val int) returns int as $$ begin
 perform pg_sleep(1);
 return val;
end ; $$ immutable language plpgsql;
create index demoi on demo (slow(n));
Enter fullscreen mode Exit fullscreen mode

This fails after an hour:

ERROR:  Timed out: BackfillIndex RPC (request call id 429) to 0.0.0.0:9100 timed out after 3600.000s
Time: 3602273.955 ms (01:00:02.274)
Enter fullscreen mode Exit fullscreen mode

You will see it as invalid and must drop it before trying again:

select tablename, indexname, indisvalid 
from pg_index
natural join (
select oid indexrelid, relname indexname from pg_class
) i
natural join (
select oid indrelid, relname tablename from pg_class
) t where not indisvalid;

 tablename | indexname | indisvalid
-----------+-----------+------------
 demo      | demoi     | f
(1 row)

Enter fullscreen mode Exit fullscreen mode

backfill_index_client_rpc_timeout_ms

If you want to test it, on a lab, you can set a very small value, like 10 seconds here:

docker rm -f yb
docker run -d --name yb -p7000:7000 -p9000:9000 -p5433:5433 \
yugabytedb/yugabyte:2.12.5.0-b24 bin/yugabyted start \
--tserver_flags="backfill_index_client_rpc_timeout_ms=10000" \
--daemon=false 
Enter fullscreen mode Exit fullscreen mode

The index creation fails after 10 seconds:

create index demoi on demo (slow(a));
ERROR:  Timed out: BackfillIndex RPC (request call id 2600) to 0.0.0.0:9100 timed out after 10.000s
Time: 12264.869 ms (00:12.265)
Enter fullscreen mode Exit fullscreen mode

increase the setting

The goal of backfill is to be able to create indexes fully online, concurrently with normal activity, and it is a good idea to increase this timeout to a larger value. For example backfill_index_client_rpc_timeout_ms=604800000 will set it to seven days.

If a CREATE INDEX is taking long, you will see it in pg_stat_activity and pg_stat_statements:

postgres=# select application_name, backend_start, xact_start, query_start, state_change, state, query from pg_stat_activity;

 application_name |         backend_start         |          xact_start           |          query_start          |         state_change          | state  |                                                       query
------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+--------+--------------------------------------------------------------------------------------------------------------------
 ysqlsh           | 2022-06-03 13:38:34.562643+00 | 2022-06-03 13:39:21.46686+00  | 2022-06-03 13:39:21.46686+00  | 2022-06-03 13:39:21.4669+00   | active | create index demoi on demo (slow(n));
                  | 2022-06-03 14:03:29.545489+00 | 2022-06-03 14:03:29.579138+00 | 2022-06-03 14:03:29.579138+00 | 2022-06-03 14:03:29.579175+00 | active | BACKFILL INDEX 16388 WITH x'0880011a00' READ TIME 6775863557200748544 PARTITION x'';
                  | 2022-06-03 14:14:29.561329+00 | 2022-06-03 14:14:29.579093+00 | 2022-06-03 14:14:29.579093+00 | 2022-06-03 14:14:29.579152+00 | active | BACKFILL INDEX 16388 WITH x'0880011a00' READ TIME 6775863557200748544 PARTITION x'';
 psql             | 2022-06-03 14:18:31.204116+00 | 2022-06-03 14:20:09.833529+00 | 2022-06-03 14:20:09.833529+00 | 2022-06-03 14:20:09.833599+00 | active | select application_name, backend_start, xact_start, query_start, state_change, state, query from pg_stat_activity;
                  | 2022-06-03 13:38:26.754005+00 |                               |                               |                               |        |
(5 rows)

postgres=# select calls, total_time, query from pg_stat_statements;

 calls |   total_time   |                                                       query
-------+----------------+-------------------------------------------------------------------------------------------------------------------
     5 | 5127796.745189 | BACKFILL INDEX 16388 WITH x'0880011a00' READ TIME 6775863557200748544 PARTITION x''
     4 |      24.885011 | select * from pg_stat_activity
     1 |    3963.828692 | create table demo as select generate_series(1,100000) n
     1 |      29.585054 | create or replace function slow(val int) returns int as $$ begin                                                 +
       |                |  perform pg_sleep(1);                                                                                            +
       |                |  return val;                                                                                                     +
       |                | end ; $$ immutable language plpgsql
     1 |       0.087609 | select calls, total_time, query from pg_stat_statements
     1 |       0.303569 | select * from pg_stat_statements
     1 |       0.086394 | select calls, query from pg_stat_statements
     4 |       0.243348 | select application_name, backend_start, xact_start, query_start, state_change, state, query from pg_stat_activity
(8 rows)
Enter fullscreen mode Exit fullscreen mode

In addition to the DDL itself, you will find the BACKFILL INDEX internal operations, running concurrently on multiple partitions.

Note that never version 2.8 some other timeouts were also applied to the index backfill, but this is not the case anymore. The backfill_index_client_rpc_timeout_ms is the only one, and its default will probably be set higher in future versions, but always good to have a look at it before creating indexes on large databases.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .