The SQL analytic functions (aka window functions) are powerful and you may use them to filter the first row in a window. However, the analytic function is processed in a subquery to be filtered later, which may prevent some optimizations in data access. I will show how the same can be done more efficiently by filtering during the Index Scan.
Here is a simple table with observation type, name, text and timestamp:
drop table observations;
create extension if not exists orafce;
create table observations (
primary key ( observation_type asc,observation_name asc, observation_date desc )
, observation_type text
, observation_name text
, observation_date timestamptz
, observation_text text
);
insert into observations select
dbms_random.string('U',1),dbms_random.string('U',2),clock_timestamp(), dbms_random.string('L',100) from generate_series(1,1000);
\watch 0.01
DENSE_RANK() = 1
I want the last value for each observation in a specific type.
Here is the execution plan when using DENSE_RANK() and filtering on the rank number 1:
explain (analyze, dist, costs off)
select o."observation_type", o."observation_name", o."observation_text"
from (
select *, dense_rank () over (
partition by o."observation_type", o."observation_name"
order by o."observation_date" desc
) as rank_number from observations o
where o."observation_type" = 'Z'
) as o where "o".rank_number=1
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Subquery Scan on o (actual time=3.176..466.641 rows=143 loops=1)
Filter: (o.rank_number = 1)
Rows Removed by Filter: 162857
-> WindowAgg (actual time=3.174..456.372 rows=163000 loops=1)
-> Index Scan using observations_pkey on observations o_1 (actual time=3.156..342.562 rows=163000 loops=1)
Index Cond: (observation_type = 'Z'::text)
Storage Index Read Requests: 160
Storage Index Execution Time: 228.001 ms
Planning Time: 0.103 ms
Execution Time: 466.741 ms
Storage Read Requests: 160
Storage Write Requests: 0
Storage Execution Time: 228.001 ms
Peak Memory Usage: 491 kB
(14 rows)
This had read rows=163000
rows from the distributed storage with Storage Index Read Requests: 160
network calls and then the postgres backend has discarded Rows Removed by Filter: 162857
from those rows to keep only rows=143
that verify the filter.
DISTINCT and scalar subquery
This could be optimized by reading only the required rows, limiting the network calls, transfer, and further single-process filtering.
This is possible on this index thanks to YugabyteDB hybrid scan but, at least in this version (YB-2.17), the filtering on the window function is not pushed down to the index scan.
To allow the hybrid scan optimization, I'll use DISTINCT in a subquery to replace the PARTITION BY of the analytic window. Then, for each window I'll get the last value with a scalar subquery doing the ORDER BY and LIMIT to get the first row for each, which is equivalent to DENSE_RANK()=1
explain (analyze, dist, costs off)
select d."observation_type", d."observation_name" , (
select observation_text
from observations
where ( observation_type , observation_name )
=(d."observation_type", d."observation_name")
order by observation_date desc
limit 1
) from (
-- get all distinct values with push down optimization
select distinct o."observation_type", o."observation_name"
from observations o
-- use range condition to force an index scan
where o."observation_type" between 'Z' and 'Z'
) d
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Subquery Scan on d (actual time=1.757..37.104 rows=143 loops=1)
-> Unique (actual time=1.392..1.695 rows=143 loops=1)
-> Index Scan using observations_pkey on observations o_1 (actual time=1.391..1.607 rows=143 loops=1)
Index Cond: ((observation_type >= 'Z'::text) AND (observation_type <= 'Z'::text))
Storage Index Read Requests: 1
Storage Index Execution Time: 0.000 ms
SubPlan 1
-> Limit (actual time=0.246..0.246 rows=1 loops=143)
-> Index Scan using observations_pkey on observations (actual time=0.237..0.237 rows=1 loops=143)
Index Cond: ((observation_type = d.observation_type) AND (observation_name = d.observation_name))
Storage Index Read Requests: 1
Storage Index Execution Time: 0.224 ms
Planning Time: 1.614 ms
Execution Time: 37.187 ms
Storage Read Requests: 144
Storage Write Requests: 0
Storage Execution Time: 32.000 ms
Peak Memory Usage: 56 kB
(18 rows)
I have added between 'Z' and 'Z'
because in this YugabyteDB version an Index Scan would not be used with = 'Z'
. It is always important to check the execution plan when doing such optimization, and add a comment in the query.
DISTINCT and LATERAL join
When you need more than one column, it is better to query from a LATERAL join rather than a scalar subquery:
explain (analyze, dist, costs off)
select o.* from (
-- get all distinct values with push down optimization
select distinct o."observation_type", o."observation_name"
from observations o
-- use range condition to force an index scan
where o."observation_type" between 'Z' and 'Z'
) d , lateral (
-- get additional information for each distinct value
select * from observations
where ( observation_type , observation_name )
=(d."observation_type", d."observation_name")
order by observation_date desc
limit 1
) o
;
The execution plan is similar. The most important is to verify the absence of Rows Removed by Filter
, the Index Scan
or Index Only Scan
for both subqueries, the small Storage Index Read Requests
for the Unique
branch, and the total Storage Read Requests
matching the rows=
for the Unique
branch.
This works in YugabyteDB thanks to Hybrid Scan (aka Skip Scan, aka Loose Index Scan). You can't use that in PostgreSQL.