IF-THEN-ELSE in SQL WHERE CLAUSE

Franck Pachot - Sep 23 '23 - - Dev Community

SQL is a declarative language working on sets of rows. There is no IF-THEN-ELSE branching or FOR/WHILE/UNTIL loops typically found in procedural languages. However, SQL is Turing-Complete, and you can achieve similar outcomes in a declarative way.

I'm running the following example on the following table:

drop table demo;
create table demo (id bigserial, a int, b int);
create index demo_a on demo(a asc, b asc);
create index demo_b on demo(b asc, a asc);
Enter fullscreen mode Exit fullscreen mode

I've run those examples in YugabyteDB and PostgreSQL. Those databases share the same SQL features and syntax.

procedural pseudo-code

Let's say that, depending on some parameters, you want to run a query with a different WHERE clause. With a procedural language, you would do the following:


if $1 then
 if $2 then
  SELECT * FROM demo WHERE (a=1 AND b=1)
 else
  SELECT * FROM demo WHERE (a=1)
else
 if $2 then
  SELECT * FROM demo WHERE (b=1)
 else
  SELECT * FROM demo

Enter fullscreen mode Exit fullscreen mode

ou can use a single SQL statement to dynamically build another SQL statement or populate a temporary table. Here's how to achieve that in a single SQL statement.

SQL query

In one SQL statement, I concatenate all scenarios using UNION ALL. I include specific parameter-based conditions in each WHERE clause for individual scenarios:

prepare demo_where as
  SELECT * FROM demo WHERE (a=1 AND b=1) AND (     $1 AND     $2 )
 UNION ALL
  SELECT * FROM demo WHERE (a=1)         AND (     $1 AND NOT $2 )
 UNION ALL
  SELECT * FROM demo WHERE (b=1)         AND ( NOT $1 AND     $2 )
 UNION ALL
  SELECT * FROM demo                   WHERE ( NOT $1 AND NOT $2 )
;
Enter fullscreen mode Exit fullscreen mode

The different scenario are easy to read. You just have to make sure that all conditions are exclusive, or you will have duplicate rows.

Execution plan

Are you worried by a query that seem to read the same table four times? Remember that SQL is declarative. You cannot infer the execution complexity from it. With EXPLAIN you will see exactly what is executed.

I'm calling the prepared statement for each combination of parameters:

yugabyte=# explain (costs off) execute demo_where(true,true); 

                QUERY PLAN
-------------------------------------------
 Append
   ->  Index Scan using demo_b on demo
         Index Cond: ((b = 1) AND (a = 1))
(3 rows)

yugabyte=# explain (costs off) execute demo_where(false,true);
              QUERY PLAN
---------------------------------------
 Append
   ->  Index Scan using demo_b on demo
         Index Cond: (b = 1)
(3 rows)

yugabyte=# explain (costs off) execute demo_where(false,false);
       QUERY PLAN
------------------------
 Append
   ->  Seq Scan on demo
(2 rows)

yugabyte=# explain (costs off) execute demo_where(true,false);
              QUERY PLAN
---------------------------------------
 Append
   ->  Index Scan using demo_a on demo
         Index Cond: (a = 1)
(3 rows)
Enter fullscreen mode Exit fullscreen mode

In each case, only one scan is executed, and it selects the optimal access path based on the branch conditions. For instance, it will use a Sequential Scan when reading all rows, and it will utilize the most suitable index for each condition.

Generic Plan

However, what if the query planner opts for a generic plan? This situation will not happen because the query planner calculates the estimated cost by summing the cost of each branch. Consequently, the estimated cost will never be smaller than the cost of an individual branch:

postgres=# set plan_cache_mode to force_generic_plan;
SET
postgres=# explain (costs on) execute demo_where(true,true);

                               QUERY PLAN
------------------------------------------------------------------------
 Append  (cost=0.00..0.02 rows=4 width=16)
   ->  Result  (cost=0.00..0.00 rows=1 width=16)
         One-Time Filter: ($1 AND $2)
         ->  Seq Scan on demo  (cost=0.00..0.00 rows=1 width=16)
               Filter: ((a = 1) AND (b = 1))
   ->  Result  (cost=0.00..0.00 rows=1 width=16)
         One-Time Filter: ($1 AND (NOT $2))
         ->  Seq Scan on demo demo_1  (cost=0.00..0.00 rows=1 width=16)
               Filter: (a = 1)
   ->  Result  (cost=0.00..0.00 rows=1 width=16)
         One-Time Filter: ((NOT $1) AND $2)
         ->  Seq Scan on demo demo_2  (cost=0.00..0.00 rows=1 width=16)
               Filter: (b = 1)
   ->  Result  (cost=0.00..0.00 rows=1 width=16)
         One-Time Filter: ((NOT $1) AND (NOT $2))
         ->  Seq Scan on demo demo_3  (cost=0.00..0.00 rows=1 width=16)
(16 rows)
Enter fullscreen mode Exit fullscreen mode

Anyway, even if a generic plan is used, only one branch will be executed. The scenario discriminator conditions are evaluated once with a One-Time Filter, which efficiently filters out branches that do not apply. These discarded branches will be clearly indicated as (never executed) in the EXPLAIN ANALYZE output. This strategy guarantees the minimization of unnecessary processing, ensuring that only the pertinent branch is executed based on the provided conditions. In contrast to the estimated cost, the actual execution time is solely influenced by the active branch, while the rest report an actual time=0.000.

OR instead of UNION ALL

[added following Hettie Dombrovskaya's comment]
When all branches read from the same table, you may find simpler to use one select with OR conditions:

prepare demo_where as
  SELECT * FROM demo 
  WHERE  ((a=1 AND b=1) AND (     $1 AND     $2 ))
  OR     ((a=1)         AND (     $1 AND NOT $2 ))
  OR     ((b=1)         AND ( NOT $1 AND     $2 ))
  OR     (( NOT $1 AND NOT $2 ))
;
Enter fullscreen mode Exit fullscreen mode

This works with the custom plan, optimizing to apply the right condition, and then being able to use the index:

yugabyte=# explain (costs on) execute demo_where(true,true);

                             QUERY PLAN
---------------------------------------------------------------------
 Index Scan using demo_b on demo  (cost=0.00..5.25 rows=10 width=16)
   Index Cond: ((b = 1) AND (a = 1))
(2 rows)
Enter fullscreen mode Exit fullscreen mode

However, if a custom plan were chosen, this optimization doesn't happen at execution time (tested in PG16):

postgres=# set plan_cache_mode to force_generic_plan;
SET
postgres=# explain (costs on) execute demo_where(true,true);
                                                                  QUERY PLAN                                    
----------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..47.00 rows=466 width=16)
   Filter: (((a = 1) AND (b = 1) AND $1 AND $2) OR ((a = 1) AND $1 AND (NOT $2)) OR ((b = 1) AND (NOT $1) AND $2) OR ((NOT $1) AND (NOT $2)))
(2 rows)
Enter fullscreen mode Exit fullscreen mode

This should not happen as the cost is higher, but I prefer to stick with the UNION ALL

To summarize

In summary, while YugabyteDB and PostgreSQL do offer IF-THEN-ELSE constructs within the PL/pgSQL procedural language, it's worth noting that a single SQL statement can achieve conditional branching and may offer a more efficient approach for both development and execution.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .