Read local partition first, then global if not found locally

Franck Pachot - Sep 1 '22 - - Dev Community

Here is another case for local reads. I have a customer_id but I don't know her country. However, there are good chances that I'm connected where her data resides. Think of it like the customers of a European bank are probably connecting from Europe. I want low latency queries for this case. If the customer is traveling and is querying from US, for example, she will probably accept a longer latency, knowing that her bank account is in Europe.

I'm running this demo on the same data as in the previous post, all DDL and DML was in an earlier post. I used yb_is_local_table(tableoid) to read from the local table.

connect to the user region

The user 9f0345c1-ff88-477d-8f87-b6ae3717ba37 is in earth region (port 5433):


yugabyte=# \c - - - 5433

psql (13.5, server 11.2-YB-2.15.1.0-b0)
You are now connected to database "yugabyte" as user "postgres".

yugabyte=# show listen_addresses;

       listen_addresses
------------------------------
 yb-tserver-0.base.earth.star
(1 row)

yugabyte=# select * from customers
           where id in ('9f0345c1-ff88-477d-8f87-b6ae3717ba37','3a890dfc-2a99-4ef4-9939-9fea1c9241ad')
           and yb_is_local_table(tableoid);

                  id                  | planet | info
--------------------------------------+--------+------
 9f0345c1-ff88-477d-8f87-b6ae3717ba37 | earth  | 1465
(1 row)


Enter fullscreen mode Exit fullscreen mode

I did this to verify where my user is, but this doesn't solve my problem: I need to know the user region, either to connect to the right one and query for the local partition.

Union All

I know that a user can be in only one region. This is not guaranteed by the database because the declarative partitioning used here, coming from PostgreSQL, has no global indexes. The primary key includes the region (planet in my example):

yugabyte=# \d customers
           Partitioned table "public.customers"
 Column | Type | Collation | Nullable |      Default
--------+------+-----------+----------+-------------------
 id     | uuid |           | not null | gen_random_uuid()
 planet | text |           | not null |
 info   | text |           |          |
Partition key: LIST (planet)
Indexes:
    "customers_pkey" PRIMARY KEY, lsm (id HASH, planet ASC)
Number of partitions: 3 (Use \d+ to list them.)
Enter fullscreen mode Exit fullscreen mode

However, if I know that I generated them uniquely I can rely on the fact that my query for one id will return only one row.

select * from customers 
 where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
 and yb_is_local_table(tableoid)
union all
select * from customers 
 where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
and not yb_is_local_table(tableoid)
limit 1;

                  id                  | planet | info
--------------------------------------+--------+------
 9f0345c1-ff88-477d-8f87-b6ae3717ba37 | earth  | 1465

Enter fullscreen mode Exit fullscreen mode

Let's look at the execution when connected to earth

\c - - - 5433

explain (costs off, analyze, summary off)
select * from customers 
 where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
 and yb_is_local_table(tableoid)
union all
select * from customers 
 where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
 and not yb_is_local_table(tableoid)
limit 1;

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.653..0.655 rows=1 loops=1)
   ->  Append (actual time=0.614..0.614 rows=1 loops=1)
         ->  Append (actual time=0.614..0.614 rows=1 loops=1)
               ->  Index Scan using customers_earth_pkey on customers_earth (actual time=0.613..0.613 rows=1 loops=1)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: yb_is_local_table(tableoid)
         ->  Append (never executed)
               ->  Index Scan using customers_earth_pkey on customers_earth customers_earth_1 (never executed)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: (NOT yb_is_local_table(tableoid))
               ->  Index Scan using customers_mars_pkey on customers_mars (never executed)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: (NOT yb_is_local_table(tableoid))
               ->  Index Scan using customers_moon_pkey on customers_moon (never executed)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: (NOT yb_is_local_table(tableoid))

Enter fullscreen mode Exit fullscreen mode

Only the first branch, reading customers_earth (the local one) has been executed, returning rows=1 and the others were skipped ((never executed)).

The same when run from moon:

\c - - - 5434

explain (costs off, analyze, summary off)
select * from customers 
 where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
 and yb_is_local_table(tableoid)
union all
select * from customers 
 where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
and not yb_is_local_table(tableoid)
limit 1;

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Limit (actual time=1.122..1.124 rows=1 loops=1)
   ->  Append (actual time=1.121..1.121 rows=1 loops=1)
         ->  Append (actual time=0.481..0.481 rows=0 loops=1)
               ->  Index Scan using customers_moon_pkey on customers_moon (actual time=0.480..0.480 rows=0 loops=1)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: yb_is_local_table(tableoid)
         ->  Append (actual time=0.640..0.640 rows=1 loops=1)
               ->  Index Scan using customers_earth_pkey on customers_earth (actual time=0.639..0.639 rows=1 loops=1)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: (NOT yb_is_local_table(tableoid))
               ->  Index Scan using customers_mars_pkey on customers_mars (never executed)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: (NOT yb_is_local_table(tableoid))
               ->  Index Scan using customers_moon_pkey on customers_moon customers_moon_1 (never executed)
                     Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                     Filter: (NOT yb_is_local_table(tableoid))
(16 rows)
Enter fullscreen mode Exit fullscreen mode

The first branch which is now on customers_moon (the local one) returned no rows (rows=0) and then the second branch was executed, partition after partition, until it returns a row.

This fits our goal: low latency when the user is local, higher latency if she is travelling.

However, it has two problems:

  • we may read two times the local partition (one because it is local, and another one if it is the first read by the global branch)
  • we rely on the execution order of the UNION ALL. Even if this guess can be verified (YugabyteDB is open source), this violates the SQL language which is declarative and not procedural. One day, an optimisation will come and change the order and we will ahve a bug.

WITH RECURSIVE

The recursive Common Table Expressions (CTE), even if still declarative SQL, is a better guarantee of the procedural order of the execution, because one level must be executed before the nested one. My query will be:

with recursive my_cte as (
 select * from customers 
  where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
  and yb_is_local_table(tableoid)
 union all 
 (
  select * from customers 
  where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
  and not yb_is_local_table(tableoid)
  union all select * from my_cte 
 )
) select * from my_cte limit 1;
Enter fullscreen mode Exit fullscreen mode

Here is the execution plan when connected to earth - the home of my user:

\c - - - 5433

explain (costs off, analyze, summary off)
with recursive my_cte as (
 select * from customers 
  where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
  and yb_is_local_table(tableoid)
 union all 
 (
  select * from customers 
  where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
  and not yb_is_local_table(tableoid)
  union all select * from my_cte 
 )
) select * from my_cte limit 1;

                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.650..0.651 rows=1 loops=1)
   CTE my_cte
     ->  Recursive Union (actual time=0.648..0.648 rows=1 loops=1)
           ->  Append (actual time=0.647..0.647 rows=1 loops=1)
                 ->  Index Scan using customers_earth_pkey on customers_earth (actual time=0.646..0.646 rows=1 loops=1)
                       Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                       Filter: yb_is_local_table(tableoid)
           ->  Append (never executed)
                 ->  Append (never executed)
                       ->  Index Scan using customers_earth_pkey on customers_earth customers_earth_1 (never executed)
                             Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                       ->  Index Scan using customers_mars_pkey on customers_mars (never executed)
                             Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                       ->  Index Scan using customers_moon_pkey on customers_moon (never executed)
                             Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                 ->  WorkTable Scan on my_cte my_cte_1 (never executed)
   ->  CTE Scan on my_cte (actual time=0.649..0.649 rows=1 loops=1)
(17 rows)

Enter fullscreen mode Exit fullscreen mode

And the same when connected to moon:

\c - - - 5434

explain (costs off, analyze, summary off)
with recursive my_cte as (
 select * from customers 
  where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
  and yb_is_local_table(tableoid)
 union all 
 (
  select * from customers 
  where id='9f0345c1-ff88-477d-8f87-b6ae3717ba37'
  and not yb_is_local_table(tableoid)
  union all select * from my_cte 
 )
) select * from my_cte limit 1;

                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.650..0.651 rows=1 loops=1)
   CTE my_cte
     ->  Recursive Union (actual time=0.648..0.648 rows=1 loops=1)
           ->  Append (actual time=0.647..0.647 rows=1 loops=1)
                 ->  Index Scan using customers_earth_pkey on customers_earth (actual time=0.646..0.646 rows=1 loops=1)
                       Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                       Filter: yb_is_local_table(tableoid)
           ->  Append (never executed)
                 ->  Append (never executed)
                       ->  Index Scan using customers_earth_pkey on customers_earth customers_earth_1 (never executed)
                             Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                       ->  Index Scan using customers_mars_pkey on customers_mars (never executed)
                             Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                       ->  Index Scan using customers_moon_pkey on customers_moon (never executed)
                             Index Cond: (id = '9f0345c1-ff88-477d-8f87-b6ae3717ba37'::uuid)
                 ->  WorkTable Scan on my_cte my_cte_1 (never executed)
   ->  CTE Scan on my_cte (actual time=0.649..0.649 rows=1 loops=1)
(17 rows)

Enter fullscreen mode Exit fullscreen mode

This solves my problem with the order of execution, at the price of a little more complex SQL statement. I still have the problem that the local partition can be read twice. For example, when querying a users from mars:

explain (costs off, analyze, summary off)
with recursive my_cte as (
 select * from customers 
  where id='841efbb7-4833-4f65-ab07-d557ebc4427a'
  and yb_is_local_table(tableoid)
 union all 
 (
  select * from customers 
  where id='841efbb7-4833-4f65-ab07-d557ebc4427a'
  and not yb_is_local_table(tableoid)
  union all select * from my_cte 
 )
) select * from my_cte limit 1;

                                                          QUERY PLAN                                                      
------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=2.364..2.365 rows=1 loops=1)
   CTE my_cte
     ->  Recursive Union (actual time=2.362..2.362 rows=1 loops=1)
           ->  Append (actual time=0.871..0.871 rows=0 loops=1)
                 ->  Index Scan using customers_moon_pkey on customers_moon (actual time=0.870..0.870 rows=0 loops=1)
                       Index Cond: (id = '841efbb7-4833-4f65-ab07-d557ebc4427a'::uuid)
                       Filter: yb_is_local_table(tableoid)
           ->  Append (actual time=1.490..1.490 rows=1 loops=1)
                 ->  Append (actual time=1.489..1.489 rows=1 loops=1)
                       ->  Index Scan using customers_earth_pkey on customers_earth (actual time=0.716..0.716 rows=0 loops=1)
                             Index Cond: (id = '841efbb7-4833-4f65-ab07-d557ebc4427a'::uuid)
                             Filter: (NOT yb_is_local_table(tableoid))
                       ->  Index Scan using customers_mars_pkey on customers_mars (actual time=0.772..0.772 rows=1 loops=1)
                             Index Cond: (id = '841efbb7-4833-4f65-ab07-d557ebc4427a'::uuid)
                             Filter: (NOT yb_is_local_table(tableoid))
                       ->  Index Scan using customers_moon_pkey on customers_moon customers_moon_1 (never executed)
                             Index Cond: (id = '841efbb7-4833-4f65-ab07-d557ebc4427a'::uuid)
                             Filter: (NOT yb_is_local_table(tableoid))
                 ->  WorkTable Scan on my_cte my_cte_1 (never executed)
   ->  CTE Scan on my_cte (actual time=2.363..2.363 rows=1 loops=1)
(20 rows)

Enter fullscreen mode Exit fullscreen mode

The row was not found locally (customers_moon (actual time=0.870..0.870 rows=0 loops=1)), and that happend in the second iteration as well:

                 ->  Append (actual time=1.489..1.489 rows=1 loops=1)
                       ->  Index Scan using customers_earth_pkey on customers_earth (actual time=0.716..0.716 rows=0 loops=1)
                             Index Cond: (id = '841efbb7-4833-4f65-ab07-d557ebc4427a'::uuid)
                             Filter: (NOT yb_is_local_table(tableoid))
Enter fullscreen mode Exit fullscreen mode

The reason is that, apparently, we don't do partition pruning for where not yb_is_local_table() as we do for where yb_is_local_table(). This should not matter as it typically adds one millisecond to a read that is probably 10ms or 100ms cross-region. But you it is a problem, you can open a git issue for an improvement.

This blog series comes from our user requirements. If you have other ideas, please share. The YugabyteDB database geo-distribution offer plenty of possibilities. If you want to know more about tablespaces for geo-partitioning, I'll talk about it on Friday's YFTT

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .