In my previous post I mentioned that the Batched Nested Loop feature has been improved and is now enabled by default. However, if you previously used this feature during its preview phase and utilized optimizer hints, there is an important change that you should be aware of. In short, it was an alternative to Nested Loop in preview, but now it is a full-fledged join method completing the existing ones: Nested Loop, Merge Join, and Hash Join.
TL;DR: If you used the NestLoop
hint in the previous version to enforce a Nested loop and set yb_bnl_batch_size
higher than 1 to batch it, you need to replace the hint with YBbatchedNL
. This is because NestLoop
hint will now enforce only the non-batched one.
I've started a cluster with the Northwind demo and two additional indexes to join customers
to orders
:
yugabyted start
yugabyted demo connect
\d
create index customers_city on customers(city, customer_id);
create index orders_customer_id on orders(customer_id);
Here is the execution plan in YugabyteDB 2.21 with all defaults and no hints:
yb_demo_northwind=# explain (analyze, costs off)
select * from customers
join orders using(customer_id)
where city='London'
order by customer_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Sort (actual time=10.076..10.077 rows=46 loops=1)
Sort Key: customers.customer_id
Sort Method: quicksort Memory: 37kB
-> Hash Join (actual time=9.438..10.027 rows=46 loops=1)
Hash Cond: (orders.customer_id = customers.customer_id)
-> Seq Scan on orders (actual time=2.336..2.734 rows=830 loops=1)
-> Hash (actual time=7.053..7.053 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using customers_city on customers (actual time=7.018..7.029 rows=6 loops=1)
Index Cond: ((city)::text = 'London'::text)
Planning Time: 0.248 ms
Execution Time: 10.169 ms
Peak Memory Usage: 315 kB
(13 rows)
In the previous version, to enable a batched nested loop you used the NestedLoop
hint for the join method and set yb_bnl_batch_size
to enabled Batched Nested Loop.
If you re-use the same hint in 2.21 you will see the non-batched nested loop:
yb_demo_northwind=# explain (analyze, costs off)
/*+ NestLoop(customers orders) Set(yb_bnl_batch_size 1024) */
select * from customers
join orders using(customer_id)
where city='London'
order by customer_id;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Nested Loop (actual time=2.692..6.457 rows=46 loops=1)
-> Index Scan using customers_city on customers (actual time=1.440..1.449 rows=6 loops=1)
Index Cond: ((city)::text = 'London'::text)
-> Index Scan using orders_customer_id on orders (actual time=0.801..0.808 rows=8 loops=6)
Index Cond: (customer_id = customers.customer_id)
Planning Time: 0.215 ms
Execution Time: 6.526 ms
Peak Memory Usage: 56 kB
(8 rows)
;
This does the same a PostgreSQL Nested Loop
with one loop for each outer row (rows=6
/ loops=6
)
To force a Batched Nested Loop you don't need to set the batch size, but use the new hint YBbatchedNL
:
explain (analyze, costs off)
/*+ YBbatchedNL(customers orders) */
select * from customers
join orders using(customer_id)
where city='London'
order by customer_id;
;
QUERY PLAN
------------------------------------------------------------------------------------------------
YB Batched Nested Loop Join (actual time=3.944..3.958 rows=46 loops=1)
Join Filter: (customers.customer_id = orders.customer_id)
Sort Keys: customers.customer_id
-> Index Scan using customers_city on customers (actual time=2.620..2.653 rows=6 loops=1)
Index Cond: ((city)::text = 'London'::text)
-> Index Scan using orders_customer_id on orders (actual time=1.097..1.136 rows=46 loops=1)
Index Cond: (customer_id = ANY (ARRAY[customers.customer_id, $1, $2, ..., $1023]))
Planning Time: 0.654 ms
Execution Time: 4.289 ms
Peak Memory Usage: 593 kB
(10 rows)
Now, all rows were fetched in one loop (loops=1
)
If you didn't use a hint but session parameters to disable other methods, you can do the same. However, now, disabling Nested Loop with enable_nestloop=off
still allows the Batched Nested Loop:
yb_demo_northwind=# set enable_hashjoin=off;
SET
yb_demo_northwind=# set enable_nestloop=off;
SET
yb_demo_northwind=# explain (analyze, costs off)
select * from customers
join orders using(customer_id)
where city='London'
order by customer_id;
QUERY PLAN
------------------------------------------------------------------------------------------------
YB Batched Nested Loop Join (actual time=2.908..2.926 rows=46 loops=1)
Join Filter: (customers.customer_id = orders.customer_id)
Sort Keys: customers.customer_id
-> Index Scan using customers_city on customers (actual time=1.355..1.365 rows=6 loops=1)
Index Cond: ((city)::text = 'London'::text)
-> Index Scan using orders_customer_id on orders (actual time=1.339..1.381 rows=46 loops=1)
Index Cond: (customer_id = ANY (ARRAY[customers.customer_id, $1, $2, ..., $1023]))
Planning Time: 0.648 ms
Execution Time: 3.213 ms
Peak Memory Usage: 657 kB
(10 rows)
If you want to disable all Nested Loops, you need to also set yb_enable_batchednl=off
:
yb_demo_northwind=# set enable_hashjoin=off;
SET
yb_demo_northwind=# set enable_nestloop=off;
SET
yb_demo_northwind=# set yb_enable_batchednl=off;
SET
yb_demo_northwind=# explain (analyze, costs off)
select * from customers
join orders using(customer_id)
where city='London'
order by customer_id;
QUERY PLAN
----------------------------------------------------------------------------------------------
Merge Join (actual time=5.839..6.334 rows=46 loops=1)
Merge Cond: (customers.customer_id = orders.customer_id)
-> Index Scan using customers_city on customers (actual time=1.961..1.983 rows=6 loops=1)
Index Cond: ((city)::text = 'London'::text)
-> Sort (actual time=3.851..3.897 rows=679 loops=1)
Sort Key: orders.customer_id
Sort Method: quicksort Memory: 150kB
-> Seq Scan on orders (actual time=1.992..2.717 rows=830 loops=1)
Planning Time: 0.306 ms
Execution Time: 6.465 ms
Peak Memory Usage: 373 kB
(11 rows)
Finally if I reset all parameters but analyze the tables, The Batched Nested Loop is chosen by the query planner:
yb_demo_northwind=# \c
You are now connected to database "yb_demo_northwind" as user "yugabyte".
yb_demo_northwind=# analyze;
WARNING: 'analyze' is a beta feature!
ANALYZE
QUERY PLAN
------------------------------------------------------------------------------------------------
YB Batched Nested Loop Join (actual time=7.830..7.851 rows=46 loops=1)
Join Filter: (customers.customer_id = orders.customer_id)
Sort Keys: customers.customer_id
-> Index Scan using customers_city on customers (actual time=1.058..1.066 rows=6 loops=1)
Index Cond: ((city)::text = 'London'::text)
-> Index Scan using orders_customer_id on orders (actual time=6.150..6.208 rows=46 loops=1)
Index Cond: (customer_id = ANY (ARRAY[customers.customer_id, $1, $2, ..., $1023]))
Planning Time: 17.507 ms
Execution Time: 8.226 ms
Peak Memory Usage: 602 kB
(10 rows)
Letting the query planner choose the join method is probably the best, especially if you also set the Cost Based Optimizer with yb_enable_optimizer_statistics
and yb_enable_base_scans_cost_model
. But if you need to disable batching for a specific join, you can use YBbatchedNL
yb_demo_northwind=# explain (analyze, costs off)
/*+ NoYBbatchedNL(customers orders) */
select * from customers
join orders using(customer_id)
where city='London'
order by customer_id;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Nested Loop (actual time=2.573..7.722 rows=46 loops=1)
-> Index Scan using customers_city on customers (actual time=1.548..1.558 rows=6 loops=1)
Index Cond: ((city)::text = 'London'::text)
-> Index Scan using orders_customer_id on orders (actual time=0.989..0.999 rows=8 loops=6)
Index Cond: (customer_id = customers.customer_id)
Planning Time: 0.433 ms
Execution Time: 7.818 ms
Peak Memory Usage: 56 kB
(8 rows)