SQL is powerful and indexes can provide fast access even when you only know partially the value you are looking for.
For this example, I'll use a list of popular baby names born in the US between 1880 and 2009 collected by Hadley Wickham:
drop table if exists baby_names;
create table baby_names (
primary key (name, year, sex)
,year int , name text, percent float, sex text
);
\! wget -qc "https://github.com/hadley/data-baby-names/blob/master/baby-names.csv?raw=true"
\copy baby_names from 'baby-names.csv?raw=true' with ( skip 1, format csv )
;
The primary key starts with the name and is HASH sharded on it (the default in YugabyteDB is HASH on the first column):
yugabyte=# \d baby_names
Table "public.baby_names"
Column | Type | Collation | Nullable | Default
---------+------------------+-----------+----------+---------
year | integer | | not null |
name | text | | not null |
percent | double precision | | |
sex | text | | not null |
Indexes:
"baby_names_pkey" PRIMARY KEY, lsm (name HASH, year ASC, sex ASC)
A query for one name is fast, using the index:
yugabyte=# explain (analyze, dist, costs off)
select name, year, sex from baby_names
where name = 'Frank' order by year;
QUERY PLAN
--------------------------------------------------------------------------------------------
Index Scan using baby_names_pkey on baby_names (actual time=1.743..1.800 rows=188 loops=1)
Index Cond: (name = 'Frank'::text)
Storage Index Read Requests: 1
Storage Index Execution Time: 2.000 ms
Planning Time: 0.059 ms
Execution Time: 1.834 ms
Storage Read Requests: 1
Storage Write Requests: 0
Storage Execution Time: 2.000 ms
Peak Memory Usage: 8 kB
(10 rows)
However, a common use-case is when we enter only the beginning of the name. For example, with a name LIKE 'Fran%'
.
No index: Seq Scan
Without an index, the whole table is scanned:
yugabyte=# explain (analyze, dist, costs off)
select name, year, sex from baby_names
where name like 'Fran%' order by year;
Sort (actual time=118.913..118.959 rows=1420 loops=1)
Sort Key: year
Sort Method: quicksort Memory: 146kB
-> Seq Scan on baby_names (actual time=118.334..118.620 rows=1420 loops=1)
Remote Filter: (name ~~ 'Fran%'::text)
Storage Table Read Requests: 1
Storage Table Execution Time: 117.999 ms
Planning Time: 1.266 ms
Execution Time: 119.047 ms
Storage Read Requests: 1
Storage Write Requests: 0
Storage Execution Time: 117.999 ms
Peak Memory Usage: 254 kB
(13 rows)
This is efficient (120 milliseconds) getting many rows (rows=1420
) in one call to the storage (Read Requests: 1
) thanks to the Remote Filter
. Without this optimization (you can set yb_enable_expression_pushdown to false;
to disable it) it would take 254 read requests to fetch 258000 rows by batches of 1024 rows (--ysql_prefetch_limit
) to apply the filter in YSQL (the postgres backend of YugabyteDB) and discard most of them (Rows Removed by Filter: 256580
).
Even if it is fast, the time complexity is O(N) on the number of rows in the table. To be scalable, we want an index access for which the time complexity depends on the result only.
Range index: Index Scan
The like 'Fran%'
is a range query: it starts at 'Fran'
and stops at 'Frao'
(o
being the next value after n
in en_US.UTF-8
with lc_collate=C
). The query planner knows that, and can add a ((name >= 'Fran'::text) AND (name < 'Frao'::text))
condition.
My primary key index being hashed on name
it cannot be used for a range scan over multiple names. I'm creating a range index for it:
create index baby_names_range_name on baby_names ( name ASC , year ASC );
yugabyte=# explain (analyze, dist, costs off)
select name, year,sex from baby_names
where name like 'Fran%' order by year;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Sort (actual time=13.473..13.522 rows=1420 loops=1)
Sort Key: year
Sort Method: quicksort Memory: 146kB
-> Index Scan using baby_names_range_name on baby_names (actual time=9.891..13.202 rows=1420 loops=1)
Index Cond: ((name >= 'Fran'::text) AND (name < 'Frao'::text))
Remote Index Filter: (name ~~ 'Fran%'::text)
Storage Index Read Requests: 2
Storage Index Execution Time: 5.000 ms
Storage Table Read Requests: 2
Storage Table Execution Time: 8.000 ms
Planning Time: 2.966 ms
Execution Time: 13.605 ms
Storage Read Requests: 4
Storage Write Requests: 0
Storage Execution Time: 13.000 ms
Peak Memory Usage: 873 kB
(16 rows)
The time is faster even if there are more read requests (for the table and the index) because each request has read only the necessary rows.
Range index: Index Only Scan
If I need only the name and year, as they are in the index, it is even faster without the need to read the table:
yugabyte=# explain (analyze, dist, costs off)
select name, year from baby_names
where name like 'Fran%' order by year;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Sort (actual time=8.182..8.237 rows=1420 loops=1)
Sort Key: year
Sort Method: quicksort Memory: 115kB
-> Index Only Scan using baby_names_range_name on baby_names (actual time=5.287..7.926 rows=1420 loops=1)
Index Cond: ((name >= 'Fran'::text) AND (name < 'Frao'::text))
Remote Filter: (name ~~ 'Fran%'::text)
Heap Fetches: 0
Storage Index Read Requests: 2
Storage Index Execution Time: 8.000 ms
Planning Time: 3.053 ms
Execution Time: 8.315 ms
Storage Read Requests: 2
Storage Write Requests: 0
Storage Execution Time: 8.000 ms
Peak Memory Usage: 222 kB
(15 rows)
Here I have Storage Index Read Requests
but no Storage Table Read Requests
thanks to the Index Only Scan. I could have the same even when querying the sex
column if it was added to the index on baby_names ( name ASC , year ASC ) include ( sex );
GIN index with Trigrams
The above was a range scan because the prefix of the pattern was known ('Fran%
). If the first letters are unknown, like '%Fran%
, we are back to full table scan:
yugabyte=# explain (analyze, costs off)
select name, year from baby_names
where name like '%Fran%' order by year;
QUERY PLAN
-------------------------------------------------------------------------------
Sort (actual time=119.386..119.429 rows=1420 loops=1)
Sort Key: year
Sort Method: quicksort Memory: 115kB
-> Seq Scan on baby_names (actual time=118.954..119.137 rows=1420 loops=1)
Remote Filter: (name ~~ '%Fran%'::text)
Storage Table Read Requests: 1
Storage Table Execution Time: 117.998 ms
Planning Time: 2.233 ms
Execution Time: 119.996 ms
Storage Read Requests: 1
Storage Write Requests: 0
Storage Execution Time: 117.998 ms
Peak Memory Usage: 234 kB
(13 rows)
For this, the solution is indexing parts of the test with a GIN index. Those parts will be trigrams and the extension pg_trgm
adds a gin_trgm_ops
operator for the GIN index:
yugabyte=# create extension if not exists pg_trgm;
yugabyte=# create index baby_names_trg_name on baby_names
using gin ( name gin_trgm_ops);
yugabyte=# explain (analyze, costs off)
select name, year from baby_names
where name like '%Fran%' order by year;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Sort (actual time=11.024..11.075 rows=1126 loops=1)
Sort Key: year
Sort Method: quicksort Memory: 101kB
-> HashAggregate (actual time=10.637..10.803 rows=1126 loops=1)
Group Key: year, name
-> Index Scan using baby_names_trg_name on baby_names (actual time=6.670..10.273 rows=1420 loops=1)
Index Cond: (name ~~ '%Fran%'::text)
Rows Removed by Index Recheck: 70
Planning Time: 0.890 ms
Execution Time: 11.178 ms
Storage Read Requests: 0
Storage Write Requests: 0
Here 1490 rows (out of the 258000 rows in my table) have been accessed by index, with a few false positives that were removed after the index scan (Rows Removed by Index Recheck: 70
) to get the final rows=1420
.
More about Trigrams
In addition to indexing text by trigrams, pg_trgm
comes with interesting functions and YugabyteDB being PostgreSQL compatible, supports all of them. Here is an example:
yugabyte=# select distinct name <-> '%Fran%' as distance, name
from baby_names where name like '%Fran%' order by 1;
distance | name
----------+------------
0 | Fran
0.428571 | Franc
0.428571 | Frank
0.428571 | Franz
0.5 | Franco
0.555556 | Frances
0.555556 | Francis
0.555556 | Frankie
0.6 | Francina
0.6 | Franklyn
0.6 | Franklin
0.6 | Francine
0.6 | Francies
0.636364 | Francisca
0.636364 | Francesca
0.636364 | Francisco
0.636364 | Francesco
0.666667 | Francisqui
(18 rows)
Here, not only I can list names that contain a pattern, but I can also order them so that the closest to the pattern are displayed first. For example, if the name you are looking for was 'Francesca' or 'Francisca' you would have probably searched with a closer pattern, like '%Fran%a':
yugabyte=> select distinct name <-> '%Fran%a' as distance, name
from baby_names where name like '%Fran%a' order by 1;
distance | name
----------+-----------
0.666667 | Francina
0.692308 | Francesca
0.692308 | Francisca
(3 rows)
Ordering on the similarity distance with a limit can suggest the most probable values. If the user doesn't find it she can add more to the pattern.
What about CockroachDB?
This post was inspired by a user moving from CockroachDB to YugabyteDB. He was creating the index without mentioning the range sharding (the ASC
) because CRDB doesn't have the equivalent of YugabyteDB hash sharding and defaults to ASC. In YugabyteDB, the default is HASH on the first column, and the index scan for a prefixed LIKE pattern was not used.
With explicit range sharding (and it makes sense to explicitly define ASC or DESC as the performance of ORDER BY may be different on a backward scan) the index scan is possible as in the demo above. The same features as PostgreSQL are available in YugabyteDB, with horizontal scalability in addition to it.
Note that I tried to run the same example on CockroachDB but I was quickly limited by the lack of PostgreSQL compatibility:
root@localhost:26257/defaultdb> select version();
version
--------------------------------------------------------------------------------------
CockroachDB CCL v22.2.8 (x86_64-pc-linux-gnu, built 2023/04/17 13:22:08, go1.19.6)
(1 row)
# loading from a CSV file
root@localhost:26257/defaultdb> \copy baby_names from 'baby-names.csv?raw=true' with ( skip 1, format csv );
ERROR: at or near "baby-names.csv?raw=true": syntax error: unimplemented: this syntax
SQLSTATE: 0A000
DETAIL: source SQL:
copy baby_names from 'baby-names.csv?raw=true' with ( skip 1, format csv )
^
HINT: You have attempted to use a feature that is not yet implemented.
# creating a GIN index
root@localhost:26257/defaultdb> create index baby_names_trg_name on baby_names using gin ( name gin_trgm_ops);
ERROR: unimplemented: primary key column name cannot be present in an inverted index
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
# using PG_TRGM functions
root@localhost:26257/defaultdb> select distinct name <-> '%Fran%' as distance, name from baby_names where name like '%Fran%' order by 1;
invalid syntax: statement ignored: at or near "-": syntax error
SQLSTATE: 42601
DETAIL: source SQL:
select distinct name <-> '%Fran%' as distance, name from baby_names where name like '%Fran%' order by 1
^
HINT: try \h SELECT
The only thing that is working like PostgreSQL is the first example: prefixed LIKE condition on a regular index. I didn't have to create the index as the primary key is already range sharded (I could have done the same in YugabyteDB by creating the table with primary key ( name asc, year, sex )
).
YugabyteDB automatic sharding
All those are PostgreSQL commands and work the same in YugabyteDB. The only thing to take care is the sharding method, with the PostgreSQL compatible syntax ASC or DESC to define range sharding when you know you will have range queries. Do not worry about which value to split on: auto-splitting will do its job when the table grows. In the absence of range query access patterns, it is still better to use HASH sharding as it directly distributes the rows. The hash function (you can see it with yb_hash_code()
) returns a value from 0x0000 to 0xFFFF so that it can start with multiple shards (ranges of hash values) and can be split further.
If you don't know all your access patterns, looking at the datatype can help. A UUID or number generated from a sequence are probably good candidates for HASH. Timestamps, numbers with arithmetic semantic, and alphabetical text, will probably be queried by ranges and ordered. In the example above, name
and year
are in this category.