Optimizing Nested Loop joins on YugabyteDB with jOOQ

Franck Pachot - Jul 4 '22 - - Dev Community

I've described in a previous post a workaround for performance issue #4903. It was about about high latency during Nested Loop joins when there are many rows from the outer table. The workaround was:

  • run a first query to get a list of join-column values
  • push down this list in a WHERE IN() clause on the inner table.

That may be cumbersome in SQL. I did it with psql variable set with gset and lazily mentioned: The same can be done from any language with dynamic SQL.

If you use jOOQ, everything about SQL becomes easier to code and the barrier between dynamic and static SQL is small as all are typesafe.

Here is an example based on my previous jOOQ on YugabyteDB post. I've just changed the query to do a simple join, and created an index on order_details(product_id)

Here is the relevant part - the jOOQ query:

   Result<Record> result = ctx
    .select()
    .from(p)
    .join(d).on(p.PRODUCT_ID.eq(d.PRODUCT_ID))
    .where(p.UNIT_PRICE.gt(50.f))
    .fetch();
Enter fullscreen mode Exit fullscreen mode

This joins product and order_details to get all orders for products having a unit_price higher than 50.

I've logged the SQL executed and run it with an EXPLAIN ANALYZE:


prepare query0 as

select "p"."product_id", "p"."product_name", "p"."supplier_id", "p"."category_id", "p"."quantity_per_unit", "p"."unit_price", "p"."units_in_stock", "p"."units_on_order", "p"."reorder_level", "p"."discontinued", "d"."order_id", "d"."product_id", "d"."unit_price", "d"."quantity", "d"."discount" from "public"."products" as "p" join "public"."order_details" as "d" on "p"."product_id" = "d"."product_id" where "p"."unit_price" > $1
;

explain (costs off, analyze) execute query0(50);

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Nested Loop (actual time=6.346..1083.373 rows=197 loops=1)
   ->  Seq Scan on order_details d (actual time=5.293..7.478 rows=2155 loops=1)
   ->  Index Scan using products_pkey on products p (actual time=0.486..0.486 rows=0 loops=2155)
         Index Cond: (product_id = d.product_id)
         Filter: (unit_price > '50'::real)
         Rows Removed by Filter: 1
 Planning Time: 0.209 ms
 Execution Time: 1083.488 ms
 Peak Memory Usage: 24 kB
(9 rows)

Enter fullscreen mode Exit fullscreen mode

This takes 1 second to return 200 rows. The reason, explained in the previous post, and the git issue, is simple: we read all order_details, and then loops=2155 to read from products.

Until the optimization is done in the database, we can improve this by getting first the list of products, and then use this list in the WHERE clause when reading from order_details.

In jOOQ, this is really simple. I've added the subquery with the same WHERE clause:

   Result<Record> result = ctx
    .select()
    .from(p)
    .join(d).on(p.PRODUCT_ID.eq(d.PRODUCT_ID))
    .where(p.UNIT_PRICE.gt(50.f))
// workaround: add a subquery returning a list of PRODUCT_ID:
    .and( d.PRODUCT_ID.in ( ctx.select(p.PRODUCT_ID).from(p)
    .where(p.UNIT_PRICE.gt(50.f))
    .fetch(p.PRODUCT_ID) ) )
// this workaround will not be needed when https://github.com/yugabyte/yugabyte-db/issues/4903 is solved
    .fetch();

Enter fullscreen mode Exit fullscreen mode

This generates two SQL statements, which I EXPLAIN ANALYZE to look at the performance. The first gets the list:

prepare query1 as

select "p"."product_id" from "public"."products" as "p" where "p"."unit_price" > $1
; 

explain (costs off, analyze) execute query1(50);

                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on products p (actual time=0.886..2.605 rows=7 loops=1)
   Filter: (unit_price > '50'::real)
   Rows Removed by Filter: 70
 Planning Time: 0.074 ms
 Execution Time: 2.648 ms
 Peak Memory Usage: 8 kB
(6 rows)

yb_demo_northwind=# execute query1(50);
 product_id
------------
         29
         20
         51
          9
         18
         38
         59
(7 rows)

Enter fullscreen mode Exit fullscreen mode

4 milliseconds to get the list, and no need for any formatting here because jOOQ directly uses it in the second query:

prepare query2 as

select "p"."product_id", "p"."product_name", "p"."supplier_id", "p"."category_id", "p"."quantity_per_unit", "p"."unit_price", "p"."units_in_stock", "p"."units_on_order", "p"."reorder_level", "p"."discontinued", "d"."order_id", "d"."product_id", "d"."unit_price", "d"."quantity", "d"."discount" from "public"."products" as "p" join "public"."order_details" as "d" on "p"."product_id" = "d"."product_id" where ("p"."unit_price" > $1 and "d"."product_id" in ($2, $3, $4, $5, $6, $7, $8))
;

explain (costs off, analyze) execute query2(50
, '29','20','51','9','18','38','59'
);

                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Nested Loop (actual time=3.458..110.218 rows=197 loops=1)
   ->  Index Scan using order_details_product_id on order_details d (actual time=2.784..9.069 rows=197 loops=1)
         Index Cond: (product_id = ANY ('{29,20,51,9,18,38,59}'::smallint[]))
   ->  Index Scan using products_pkey on products p (actual time=0.498..0.498 rows=1 loops=197)
         Index Cond: (product_id = d.product_id)
         Filter: (unit_price > '50'::real)
 Planning Time: 37.526 ms
 Execution Time: 110.361 ms
 Peak Memory Usage: 1088 kB
(9 rows)
Enter fullscreen mode Exit fullscreen mode

This reduced the number of loops because the filtering on products is now done on both sides of the join, thanks to the additional predicate. The response time is reduced accordingly.

A Nested Loop was still used here because the database is small, but with this query it can efficiently switch to a Hash Join:

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Hash Join (actual time=12.914..12.955 rows=197 loops=1)
   Hash Cond: (p.product_id = d.product_id)
   ->  Seq Scan on products p (actual time=1.961..1.968 rows=7 loops=1)
         Remote Filter: (unit_price > '50'::real)
   ->  Hash (actual time=10.934..10.934 rows=197 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 18kB
         ->  Index Scan using order_details_product_id on order_details d (actual time=2.628..10.891 rows=197 loops=1)
               Index Cond: (product_id = ANY ('{29,20,51,9,18,38,59}'::smallint[]))
 Planning Time: 0.267 ms
 Execution Time: 13.031 ms
 Peak Memory Usage: 73 kB
(11 rows)
Enter fullscreen mode Exit fullscreen mode

Each table access is efficient, filtering upfront during the scan, thanks to the pushdown of the predicates. Then, using one Hash Join instead of Nested Loops reduces the remote calls to the storage nodes to the minimum.

Please remember that this is a workaround, and you need to get a correct idea of the number of values in the list that is pushed down, and test it.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .