CREATE INDEX in YugabyteDB: online or fast?

Franck Pachot - Oct 7 '21 - - Dev Community

When you create an index, you want the creation to be either:

  • fast, blocking concurrent writes to get a stable version of the rows until the creation is completed, usually with a share lock on the table
  • online, which means that it doesn't block, or even slow down, concurrent operation. This requires additional synchronization steps and is slower. However, slower is not a problem when nobody waits on it.

On traditional databases, we are used to pessimistic locking, and blocking-DDL statements is the default. Online index creation is an option, and sometimes with the high cost of Enterprise License. The reason is also legacy: online operations were implemented later, when applications became 24/7. But YugabyteDB is new, is free, and, even if supporting all SQL operations, has its roots in hyper-scale databases where you do not wait on locks. The default is to allow concurrent DML when creating an index. According to the above classification, this means that the CREATE INDEX may appear slow. This not a problem when creating an index online, as the application is still running. However there are some situations where you know you are alone on the system and want the fastest index creation. Especially when you have many indexes to create because you are importing a dump, or running a full schema DDL. CI/CD may also re-ceate the schema frequently and you definitely don't want the overhead to synchronize with potential concurrent DML.

As an example, I've run the following:

{
echo "drop table if exists demo;"
echo "create table demo (id int primary key " ,c{1..100}" int" ");  "
echo "\\timing on"
for i in c{1..100} ; do echo "create index demo$i on demo($i);" ; done
} | psql 
Enter fullscreen mode Exit fullscreen mode

This 100 index creation ran in 1 hour, with each creation taking between 30 and 60 seconds. The reason is that, to support backfill of concurrent DML, there are multiple changes that need to be in sync for all yb-tserver. This goes through yb-master heartbeats, which introduces some waits. Note that I'm writing this on version 2.9 and it may be improved in the future.

BACKFILL

On another run, looking at pg_stat_statements shows the INDEX BACKFILL which is short, around 50 milliseconds, but the CREATE INDEX is around 40 seconds. Because there are a lot of wait operations for this online index creation:

vd -f json http://localhost:13000/statements


 query                                                                                              | calls#| total_time%|↑min_time%| max_time%| mean_time%| stddev_time%| rows#║
 select pg_stat_statements_reset()                                                                  |     1 |       0.18 |     0.18 |     0.18 |      0.18 |        0.00 |    1 ║
 BACKFILL INDEX 66055 WITH x'0880011a00' READ TIME 6690842023934689280 PARTITION x'aaaa'            |     1 |      38.29 |    38.29 |    38.29 |     38.29 |        0.00 |    0 ║
 BACKFILL INDEX 66053 WITH x'0880011a00' READ TIME 6690841697617080320 PARTITION x'aaaa'            |     1 |      45.90 |    45.90 |    45.90 |     45.90 |        0.00 |    0 ║
 BACKFILL INDEX 66054 WITH x'0880011a00' READ TIME 6690841855848988672 PARTITION x'aaaa'            |     1 |      46.97 |    46.97 |    46.97 |     46.97 |        0.00 |    0 ║
 BACKFILL INDEX 66057 WITH x'0880011a00' READ TIME 6690842368179253248 PARTITION x'aaaa'            |     1 |      54.83 |    54.83 |    54.83 |     54.83 |        0.00 |    0 ║
 BACKFILL INDEX 66056 WITH x'0880011a00' READ TIME 6690842190767943680 PARTITION x'aaaa'            |     1 |      55.29 |    55.29 |    55.29 |     55.29 |        0.00 |    0 ║
 BACKFILL INDEX 66052 WITH x'0880011a00' READ TIME 6690841540903989248 PARTITION x'aaaa'            |     1 |      70.18 |    70.18 |    70.18 |     70.18 |        0.00 |    0 ║
 create table demo (id int primary key  ,c1 int ,c2 int ,c3 int ,c4 int ,c5 int ,c6 int ,c7 int ,c8…|     1 |    5454.30 |  5454.30 |  5454.30 |   5454.30 |        0.00 |    0 ║
 create index democ2 on demo(c2)                                                                    |     1 |   37898.21 | 37898.21 | 37898.21 |  37898.21 |        0.00 |    0 ║
 create index democ3 on demo(c3)                                                                    |     1 |   38600.98 | 38600.98 | 38600.98 |  38600.98 |        0.00 |    0 ║
 create index democ5 on demo(c5)                                                                    |     1 |   40794.47 | 40794.47 | 40794.47 |  40794.47 |        0.00 |    0 ║
 create index democ4 on demo(c4)                                                                    |     1 |   40883.21 | 40883.21 | 40883.21 |  40883.21 |        0.00 |    0 ║
 create index democ1 on demo(c1)                                                                    |     1 |   41132.44 | 41132.44 | 41132.44 |  41132.44 |        0.00 |    0 ║
 create index democ6 on demo(c6)                                                                    |     1 |   45125.51 | 45125.51 | 45125.51 |  45125.51 |        0.00 |    0 ║

Enter fullscreen mode Exit fullscreen mode

Note that BACKFILL INDEX is an internal command. You cannot run it yourself, but it is monitored by pg_stat_statements.

NONCONCURRENTLY

The solution when you are alone on the database, creating indexes on empty or small tables, is to disable backfill. The index will still be created on a snapshot view of the table (which is empty in my case) but without trying to backfill concurrent transactions. This is fine here as I know I have no concurrent DML.

{
echo "drop table if exists demo;"
echo "create table demo (id int primary key " ,c{1..100}" int" ");  "
echo "\\timing on"
for i in c{1..100} ; do echo "create index nonconcurrently demo$i on demo($i);" ; done
} | psql 
Enter fullscreen mode Exit fullscreen mode

This 100 index creation ran in 4 minutes.

This is for no-DML only!

NONCONCURRENTLY must be used only when you can guarantee that no concurrent DML will happen on the table. This will not be verified by the database engine, as this is the way to run it faster. It is different from an offline index creation, like in PostgreSQL without the CONCURRENTLY keyword, because the table is not locked in YugabyteDB. This means that CREATE INDEX NONCONCURRENTLY will leave the index in an incorrect state if there are modifications to the underlying table. And this is not visible from pg_index.

So this is to be used only to load a schema without other activity on it.

I want to be sure you know the consequences. Here is an example of bad things that can happen if you incorrectly mention NONCONCURRENTLY:

poc=# create table demo (a int ,b int);
CREATE TABLE
poc=# \! for i in {1000..1005} ; do psql -c "insert into demo select n,n from generate_series(1,1000) n" ; done &
poc=# 

INSERT 0 1000
Enter fullscreen mode Exit fullscreen mode

This runs 5 times a 1000 rows insert with same value for demo.a and demo.b, and the first iteration already occurred.

create index nonconcurrently demoa on demo(a);

INSERT 0 1000

CREATE INDEX
Enter fullscreen mode Exit fullscreen mode

I've created the index with nonconcurrently when, obviously, there was some DML at the same time (we see that another insert of 1000 rows happened during that time)

poc=# select /*+ SeqScan(demo) */ count(*),sum(a),sum(b) from demo where a>0;

 count |   sum   |   sum
-------+---------+---------
  2000 | 1001000 | 1001000
(1 row)

Enter fullscreen mode Exit fullscreen mode

The Seq Scan on the table shows the 2000 rows, which is right given the feedback of two INSERT 0 1000.

However, from the index I see only a snapshot from the point in time where the CREATE INDEX happened:

poc=# select /*+ IndexScan(demo demoa) */ count(*),sum(a),sum(b) from demo where a>0;
 count |  sum   |  sum
-------+--------+--------
  1000 | 500500 | 500500
(1 row)
Enter fullscreen mode Exit fullscreen mode

Then a new iteration occurs:

poc=# 

INSERT 0 1000

poc=# 
Enter fullscreen mode Exit fullscreen mode

Now, here is what is seen from the table (Seq Scan) and index (Index Scan):

poc=# select /*+ SeqScan(demo) */ count(*),sum(a),sum(b) from demo where a>0;

 count |   sum   |   sum
-------+---------+---------
  3000 | 1501500 | 1501500
(1 row)

poc=# select /*+ IndexScan(demo demoa) */ count(*),sum(a),sum(b) from demo where a>0;

 count |   sum   |   sum
-------+---------+---------
  2000 | 1001000 | 1001000
(1 row)
Enter fullscreen mode Exit fullscreen mode

The index continues to be maintained, but missed the rows that were inserted during the create, because there were no backfill.

Unfortunately, in the current version (2.9) there's no indication about it, so be careful:

poc=# \x
Expanded display is on.
poc=# select relname,indisvalid,indisready,indislive from (select oid indexrelid, relname from pg_class where relname like 'demoa') r natural join pg_index i;

-[ RECORD 1 ]--+------
relname        | demoa
indisvalid     | t
indisready     | t
indislive      | t
Enter fullscreen mode Exit fullscreen mode

In summary:

There's a reason why online index creation is the default: it guarantees the consistency of data whatever happens on your database.

Yes, you may create small indexes faster with NONCONCURRENTLY, especially on empty tables, and there are cases for that (restore, migration, CI/CD) but it is your responsibility to ensure that there are no concurrent DML. Note that even without the concern of concurrent DML, creating indexes on large tables will be faster with the default behavior (backfill) because it will be parallelized (based on the number of cores).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .