Even a Better Index 🍃 for The Most Annoying Postgres Optimizer Fail

Franck Pachot - Dec 11 '23 - - Dev Community

In The Most Annoying Optimizer Fail in Postgres ✅ Best index solved it, I would not say "Best" because we can do better!

The query's filter IN(1,2,3) caused some inefficiency in my index. Although the filter was applied during the scan, my index still had to read and discard some entries. It's worth noting that PostgreSQL can only read one range from an Index Scan, whereas YugabyteDB can read multiple ranges. However, it doesn't preserve the order from the leading index key with loose index scans. In the case of this particular LIMIT 1 query, keeping the order was more important than filtering a third of the rows based on their status.

John Page pointed out that MongoDB optimizes queries with fewer than 200 ranges by using a sort merge:

Sorting multiple sorted ranges can be accomplished through a relatively inexpensive operation called Sort Merge. This process merges the ranges while maintaining the sorted order of the resulting set.

Test Case

Please find below the SQL script needed to recreate the example in the previous blog post:

create table demo (
  primary key(id)
  , id uuid, seq bigint, status int, value text
);

--- rows inserted with `seq` higher than zero
insert into demo
 select gen_random_uuid(), generate_series(1,1000000), 10*random(), 'x'
;

--- rows inserted with `seq` equal to zero and higher UUID
insert into demo (id, seq, status, value)
 select ('f'||substr(gen_random_uuid()::text,2))::uuid , 0, 10*random(), 'x' from demo;

-- gather optimizer statistics and, if PostgreSQL, vacuum, if YugabyteDB, enable cost based optimizer
select version() like '%-YB-%' as is_yb \gset
\if :is_yb
 set yb_enable_optimizer_statistics=on;
 set yb_enable_base_scans_cost_model=on;
 analyze demo;
\else
 vacuum analyze demo;
\endif
Enter fullscreen mode Exit fullscreen mode

Merge Append with Union All

Let's attempt to accomplish the same task using either PostgreSQL or YugabyteDB.

I created the index as mentioned by John:

yugabyte=# create index demo_seq_status_id 
           on demo (seq , status, id ASC);

CREATE INDEX
Enter fullscreen mode Exit fullscreen mode

YugabyteDB can perform a loose index scan, but the result is sorted by status,id instead of id, which requires reading and sorting all rows before returning the Top-1 row.

yugabyte=# explain (analyze, costs off, summary off)
 select id,seq,status from demo where seq=0 and status in (1,2,3)
 order by id limit 1;

                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Limit (actual time=319.992..319.993 rows=1 loops=1)
   ->  Sort (actual time=319.990..319.991 rows=1 loops=1)
         Sort Key: id
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Only Scan using demo_seq_status_id on demo (actual time=1.523..239.137 rows=299856 loops=1)
               Index Cond: ((seq = 0) AND (status = ANY ('{1,2,3}'::integer[])))
               Heap Fetches: 0
Enter fullscreen mode Exit fullscreen mode

PostgreSQL's Merge Append operation can interleave multiple sorted result sets and preserve order. This is typically used with a UNION ALL and requires query rewriting:

yugabyte=# explain (analyze, dist, costs off)
(select id,seq,status from demo where seq=0 and status=1 order by id )
union all
(select id,seq,status from demo where seq=0 and status=2 order by id )
union all
(select id,seq,status from demo where seq=0 and status=3 order by id )
order by id limit 1;

                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Limit (actual time=2.159..2.161 rows=1 loops=1)
   ->  Merge Append (actual time=2.158..2.158 rows=1 loops=1)
         Sort Key: demo.id
         ->  Index Only Scan using demo_seq_status_id on demo (actual time=0.963..0.963 rows=1 loops=1)
               Index Cond: ((seq = 0) AND (status = 1))
               Heap Fetches: 0
               Storage Index Read Requests: 1
               Storage Index Read Execution Time: 0.827 ms
         ->  Index Only Scan using demo_seq_status_id on demo demo_1 (actual time=0.614..0.614 rows=1 loops=1)
               Index Cond: ((seq = 0) AND (status = 2))
               Heap Fetches: 0
               Storage Index Read Requests: 1
               Storage Index Read Execution Time: 0.549 ms
         ->  Index Only Scan using demo_seq_status_id on demo demo_2 (actual time=0.578..0.578 rows=1 loops=1)
               Index Cond: ((seq = 0) AND (status = 3))
               Heap Fetches: 0
               Storage Index Read Requests: 1
               Storage Index Read Execution Time: 0.511 ms
 Planning Time: 0.215 ms
 Execution Time: 2.218 ms
 Storage Read Requests: 3
 Storage Read Execution Time: 1.888 ms
 Peak Memory Usage: 56 kB
Enter fullscreen mode Exit fullscreen mode

In my query, I had to include the ORDER BY clause for each subquery. However, the LIMIT 1 clause was pushed down, which is an efficient way of reading only one row (rows=1) for each Index Scan. However, in YugabyteDB, since the index is distributed, it had to read a total of 3 rows, with 3 read requests, before discarding two of them and keep the first one.

We could consider implementing this merge operation in the YugabyteDB index loose scan, without having to rewrite the query. However, it would be complex to achieve this when the index is distributed. It would require a first merge within each database mode, followed by another merge in the SQL layer. Additionally, the gain from this approach may be limited as it still has to read multiple scattered index entries, one per value.

One Range Scan with Partial Index

PostgreSQL and YugabyteDB offer a possibility to filter on this IN() condition with a partial index:

yugabyte=# drop index demo_seq_status_id;

DROP INDEX

yugabyte=# create index demo_seq_id_status123
           on demo (seq, id ASC) where status in (1,2,3)
;

CREATE INDEX
Enter fullscreen mode Exit fullscreen mode

With such index there is no need to rewrite the original query. The index will directly find the right row.

yugabyte=# explain (analyze, dist, costs off)
 select id,seq,status from demo where seq=0 and status in (1,2,3)
 order by id limit 1;

                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Limit (actual time=1.302..1.303 rows=1 loops=1)
   ->  Index Scan using demo_seq_id_status123 on demo
         (actual time=1.300..1.301 rows=1 loops=1)
         Index Cond: (seq = 0)
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 0.617 ms
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 0.493 ms
 Planning Time: 0.137 ms
 Execution Time: 1.346 ms
 Storage Read Requests: 2
 Storage Read Execution Time: 1.110 ms
 Peak Memory Usage: 24 kB
(17 rows)
Enter fullscreen mode Exit fullscreen mode

During an Index Read, we were able to filter out enough data to only need to read one row. This is because the condition on status was immediately verified - the index only stores entries that match the condition. After this, we performed a range scan on seq ordered by id. The first entry that was read from this scan is our desired result.

By using a covering index, we can perform an Index Only Scan without reading the table:

yugabyte=# drop index demo_seq_id_status123;

DROP INDEX

yugabyte=# create index demo_seq_id_status123_value
           on demo (seq, id ASC)
           include (status, value)
           where status in (1,2,3)
;

CREATE INDEX
Enter fullscreen mode Exit fullscreen mode

Here is the best we can do, getting directly the one row for the result in only one index read request:

yugabyte=# explain (analyze, dist, debug, costs off)
 select id,seq,status from demo where seq=0 and status in (1,2,3)
 order by id limit 1;

                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Limit (actual time=1.024..1.025 rows=1 loops=1)
   ->  Index Only Scan using demo_seq_id_status123_value on demo (actual time=1.022..1.022 rows=1 loops=1)
         Index Cond: (seq = 0)
         Heap Fetches: 0
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 0.891 ms
         Metric rocksdb_number_db_seek: 1.000
         Metric rocksdb_number_db_next: 2.000
         Metric rocksdb_number_db_seek_found: 1.000
         Metric rocksdb_number_db_next_found: 2.000
         Metric rocksdb_iter_bytes_read: 301.000
         Metric docdb_keys_found: 2.000
Enter fullscreen mode Exit fullscreen mode

I utilized the debug option of explain analyze to obtain more details regarding a single read request. The request involved one rocksdb_number_db_seek to move to the beginning of the range, and two rocksdb_number_db_next to retrieve the first index entry. It is the best way to fetch the single row.

What is the best index?

When working with a SQL database, the aim of indexing is to enhance the performance of your queries while minimizing the number of indexes required. It is advisable to create a single index that can help optimize multiple queries, rather than creating a separate index for each query. This approach not only simplifies your database structure, but also reduces the maintenance overhead associated with managing multiple indexes.

To quickly filter rows based on the seq column with an equality condition and keep the order by id, we can use any of the following three indexes. To additionally filter on a list of values in the seq column, still preserving the order, there are multiple possibilities:

  • (seq, id ASC, status) as shown in the previous post may immediately find the first row or it may need to read multiple index entries until it finds a matching status in the IN() list. It depends on the frequency of those status value. Including the status in the index is important to filter during the index scan rather than with random reads to the table.

  • (seq, status, id ASC) will require reading one index range per value in the list. In PostgreSQL or YugabyteDB the query must be written with a UNION ALL to use it. This could be a cheap merge for a few values on the same node, but more costly with many values in a distributed database.

  • (seq, id ASC) include (status) where status in (1,2,3) will read only one index entry and never mode than that. It is the best index for this query but also necessitate to create an index dedicated this specific list of values.

The decision depends on the business meaning of this list of statuses and how they are used by one or multiple queries.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .