Loose Index Scan aka Skip Scan in PostgreSQL 🐘🚀

Franck Pachot - Jul 26 '22 - - Dev Community

In a previous post, I was looking at the solutions proposed by @ryanbooz in his Select the Most Recent Record (of Many Items) With PostgreSQL , and how they map to YugabyteDB. Ryan works with Timescale which implements Index Skip Scan transparently for a DISTINCT ON (truck_id). He mentions the PostgreSQL alternative with Recursive CTE documented as Loose indexscan but also that the biggest drawback with this approach is that it's much more difficult to return multiple columns of data. This shows a DISTINCT returning only one column.

Here is my version, working with additional columns, for the fastest answer to the Most recent record (of many items) when you have the right index.

I'm creating the same tables as in the previous post (and Ryan Booz article):

drop table truck_readings;

create table truck_readings (
 truck_id bigint 
 ,ts timestamptz
 ,milage int
 ,primary key(truck_id, ts)
);
Enter fullscreen mode Exit fullscreen mode

I am using YugabyteDB here, open-source distributed SQL database which is compatible with PostgreSQL. The syntax is the same but the rows are distributed, by default, by hash on the first column.

I insert some rows: ten thousand readings for each of the two thousand trucks.

with trucks as (select generate_series(1,2000) truck_id)
insert into truck_readings(truck_id, ts, milage)
select truck_id, now() - 
 interval '1 second' * generate_series(1,10000)
 , 42 from trucks;
analyze truck_readings;
Enter fullscreen mode Exit fullscreen mode

At this point, to get the last reading for each truck, we have the same solutions as in the previous post. A list of trucks helps, and maintaining the last value by a trigger is one efficient solution.

For this post, I use a technique similar to an Index Skip Scan, but driven from SQL by a WITH RECURSIVE clause. In PostgreSQL, which is not distributed, or YugabyteDB when you sharded by range, you don't need another index. But if you are hash sharding, or simply when you have a generated primary key, you need a secondary index. Here it is:

create index truck_last_reading on truck_readings 
( truck_id asc, ts asc) include (milage);
Enter fullscreen mode Exit fullscreen mode

This index is ordered by truck_id first, and then by ts, the timestamp. It includes all columns I need. Here I want the mileage for the last reading of each truck.

Note: if you wonder if it is a good idea to have milage as include... it depends. Very good remarks from Michael and Nikolay at
https://youtu.be/8wQ9NHQTQZQ?t=1059

My goal is to start at the end of the index so that I have the last reading for the last truck:

yugabyte=#

  select
   last_truck_last_reading.truck_id,
   last_truck_last_reading.milage
   from truck_readings last_truck_last_reading
  order by
   last_truck_last_reading.truck_id desc,
   last_truck_last_reading.ts desc
  limit 1;

 truck_id | milage
----------+--------
     2000 |     42
(1 row)
Enter fullscreen mode Exit fullscreen mode

Then skip to the next one:

yugabyte=#

  select truck_id,milage from truck_readings
  where truck_id < 2000
  order by truck_id desc, ts desc limit 1;

 truck_id | milage
----------+--------
     1999 |     42
Enter fullscreen mode Exit fullscreen mode

I can combine them with a join to get the truck_id from the previous one:

with truck_last_reading as (
  select
   last_truck_last_reading.truck_id,
   last_truck_last_reading.milage
   from truck_readings last_truck_last_reading
  order by
   last_truck_last_reading.truck_id desc,
   last_truck_last_reading.ts desc
  limit 1
)
 select
  next_truck_last_reading.truck_id,
  next_truck_last_reading.milage
 from truck_last_reading, lateral
 (
  select truck_id,milage from truck_readings
  where truck_id < truck_last_reading.truck_id
  order by truck_id desc, ts desc limit 1
 )
 as next_truck_last_reading;
Enter fullscreen mode Exit fullscreen mode

WITH RECURSIVE ... LATERAL

Finally, I can show both rows with a UNION ALL. And do this recursively to find the next truck_id and concatenate it to the result:

with recursive truck_last_reading as (
 (
  select
   last_truck_last_reading.truck_id,
   last_truck_last_reading.milage
   from truck_readings last_truck_last_reading
  order by
   last_truck_last_reading.truck_id desc,
   last_truck_last_reading.ts desc
  limit 1
 )
union all
 select
  next_truck_last_reading.truck_id,
  next_truck_last_reading.milage
 from truck_last_reading, lateral
 (
  select truck_id,milage from truck_readings
  where truck_id < truck_last_reading.truck_id
  order by truck_id desc, ts desc limit 1
 )
 as next_truck_last_reading
) select * from truck_last_reading
order by truck_id,milage;
Enter fullscreen mode Exit fullscreen mode

This is the final query which gives the required result (the last reading, in ts order, for each truck_id and return the milage). The execution plan is optimal:
,

                                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort (actual time=1078.569..1078.661 rows=2000 loops=1)
   Output: truck_last_reading.truck_id, truck_last_reading.milage
   Sort Key: truck_last_reading.truck_id, truck_last_reading.milage
   Sort Method: quicksort  Memory: 142kB
   CTE truck_last_reading
     ->  Recursive Union (actual time=0.872..1075.442 rows=2000 loops=1)
           ->  Subquery Scan on "*SELECT* 1" (actual time=0.871..0.872 rows=1 loops=1)
                 Output: "*SELECT* 1".truck_id, "*SELECT* 1".milage
                 ->  Limit (actual time=0.871..0.871 rows=1 loops=1)
                       Output: last_truck_last_reading.truck_id, last_truck_last_reading.milage, last_truck_last_reading.ts
                       ->  Index Only Scan Backward using truck_last_reading on public.truck_readings last_truck_last_reading (actual time=0.870..0.870 rows=1 loops=1)
                             Output: last_truck_last_reading.truck_id, last_truck_last_reading.milage, last_truck_last_reading.ts
                             Heap Fetches: 0
           ->  Nested Loop (actual time=0.525..0.526 rows=1 loops=2000)
                 Output: truck_readings.truck_id, truck_readings.milage
                 ->  WorkTable Scan on truck_last_reading truck_last_reading_1 (actual time=0.000..0.000 rows=1 loops=2000)
                       Output: truck_last_reading_1.truck_id, truck_last_reading_1.milage
                 ->  Limit (actual time=0.524..0.525 rows=1 loops=2000)
                       Output: truck_readings.truck_id, truck_readings.milage, truck_readings.ts
                       ->  Index Only Scan Backward using truck_last_reading on public.truck_readings (actual time=0.513..0.513 rows=1 loops=2000)
                             Output: truck_readings.truck_id, truck_readings.milage, truck_readings.ts
                             Index Cond: (truck_readings.truck_id < truck_last_reading_1.truck_id)
                             Heap Fetches: 0
   ->  CTE Scan on truck_last_reading (actual time=0.874..1077.252 rows=2000 loops=1)
         Output: truck_last_reading.truck_id, truck_last_reading.milage
 Planning Time: 0.165 ms
 Execution Time: 1078.821 ms
 Peak Memory Usage: 337 kB
(28 rows)
Enter fullscreen mode Exit fullscreen mode

There's no workarea memory needed because only one row is read for each iteration. This uses the index for its best usage: one Index Only Scan for each truck_id, and without the need to get a list of truck_id first. Each read from the index seeks directly to the right place and reads only one row. YugabyteDB pushes down the LIMIT 1 to the storage tablet server.

In theory this access should be possible with hash sharding on the truck_id, and sorting on the hash rather than the value. The goal is just to get to the next truck_id, then an order by yb_hash_code(truck_id) desc, truck_id desc, ts desc could use the same bit there's no implementation for it. Please open a git issue if you need it. But you can also declare the table with primary key (truck_id desc, ts desc) and pre-split it or let auto-splitting do it when table grows. Note that I've run this example on version 2.15.1 because 2.15.0 had an issue with this.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .