Row Level Security with an ARRAY of tenants set in session parameter (RLS)

Franck Pachot - Nov 18 '23 - - Dev Community

One way to effectively implement multitenancy for SaaS while optimizing resources and maintenance is to store all data in a single table and add a tenant_id as a key. By utilizing RLS (Row Level Security), each tenant will only be able to view their own data. This can be achieved by setting a GUC (Global User Configuration) session variable containing the tenant_id and incorporating it into the RLS policy. This will ensure tenant isolation by automatically adding a WHERE clause to all queries.

I exposed the solution in a previous post.

In certain cases, the policy may allow access to a list of tenants instead of just one. Unfortunately, it's not possible to define a GUC session variable as an array. However, there are workarounds available to handle this situation effectively.

Demo table with tenant_id

I create a table consisting of 26 tenant_id values, each representing a letter from A to Z, and ten million rows.

create extension if not exists orafce;
alter function dbms_random.string volatile;
create table demo as 
 select generate_series(1,10000000) value
 , dbms_random.string('u',1) tenant_id
;
create index demo_tenant_id on demo(tenant_id, value);

Enter fullscreen mode Exit fullscreen mode

I will explain how to limit the queries to specific tenants during the demo. In this case, we will be restricting the queries to five tenants: 'A', 'E', 'I', 'O', and 'U'.

Without RLS policy

Without using row level security (RLS), I need to include a WHERE clause utilizing an ARRAY with either IN() or = ANY.

yugabyte=# explain (costs off, analyze)
select tenant_id from demo 
where tenant_id = ANY (ARRAY['A','E','I','O','U']::text[]) ;

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Index Only Scan using demo_tenant_id on demo (actual time=7.716..2193.248 rows=1923830 loops=1)
   Index Cond: (tenant_id = ANY ('{A,E,I,O,U}'::text[]))
   Heap Fetches: 0
 Planning Time: 8.809 ms
 Execution Time: 2346.935 ms
 Peak Memory Usage: 18 kB
(6 rows)
Enter fullscreen mode Exit fullscreen mode

With a temporary table

To compare the performance, I use a WITH clause to utilize a temporary table instead of an array:

yugabyte=> explain (costs off, analyze)
/*+ Set(yb_bnl_batch_size 1024) */
with allowed(tenant_id) as (values ('A'), ('E'), ('I'), ('O'), ('U') )
select tenant_id from demo 
where tenant_id in ( select tenant_id::text from allowed) ;

                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 YB Batched Nested Loop Join (actual time=12.210..2671.709 rows=1923830 loops=1)
   Join Filter: (demo.tenant_id = allowed.tenant_id)
   CTE allowed
     ->  Values Scan on "*VALUES*" (actual time=0.002..0.003 rows=5 loops=1)
   ->  HashAggregate (actual time=0.013..0.014 rows=5 loops=1)
         Group Key: allowed.tenant_id
         ->  CTE Scan on allowed (actual time=0.005..0.007 rows=5 loops=1)
   ->  Index Only Scan using demo_tenant_id on demo (actual time=12.119..1626.838 rows=1923830 loops=1)
         Index Cond: (tenant_id = ANY (ARRAY[allowed.tenant_id, $2, $3, ..., $1024]))
         Heap Fetches: 0
 Planning Time: 0.565 ms
 Execution Time: 2802.131 ms
 Peak Memory Usage: 494 kB
(13 rows)
Enter fullscreen mode Exit fullscreen mode

With batched nested loop this is the same as with a list if values.

With UNNEST array

I can do the same with a subquery:

explain (costs off, analyze)
/*+ NestLoop(unnest demo) Set(yb_bnl_batch_size 1024) */
select tenant_id from demo 
where tenant_id in ( 
 select * from unnest(ARRAY['A','E','I','O','U']::text) 
) ;

                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 YB Batched Nested Loop Join (actual time=3.202..3075.332 rows=1923830 loops=1)
   Join Filter: (demo.tenant_id = unnest.unnest)
   ->  HashAggregate (actual time=0.013..0.014 rows=5 loops=1)
         Group Key: unnest.unnest
         ->  Function Scan on unnest (actual time=0.009..0.009 rows=5 loops=1)
   ->  Index Only Scan using demo_tenant_id on demo (actual time=3.113..1994.448 rows=1923830 loops=1)
         Index Cond: (tenant_id = ANY (ARRAY[unnest.unnest, $1, $2, ..., $1023]))
         Heap Fetches: 0
 Planning Time: 0.569 ms
 Execution Time: 3206.993 ms
 Peak Memory Usage: 494 kB
(11 rows)
Enter fullscreen mode Exit fullscreen mode

Without the Cost Based Optimizer enabled, I needed to force the YB Batched Nested Loop Join. Performance remained the same, so I chose to use the string representation of the array which will be easier to set in a variable.

From a session variable

My goal is to set the array as a session parameter but session variables are scalar. I can set it as a string representation of an array:

yugabyte=# set rls.tenant_id='{A,E,I,O,U}';
SET
Enter fullscreen mode Exit fullscreen mode

The value can be read in a query with current_setting()

WHERE clause with current_setting()

Until a RLS policy is defined, this string can be read and cast as an array in the WHERE clause:

yugabyte=# explain (costs off, analyze)
select tenant_id from demo 
where tenant_id = ANY ( current_setting('rls.tenant_id')::text[] ) ;

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Index Only Scan using demo_tenant_id on demo (actual time=2.991..2439.003 rows=1923830 loops=1)
   Index Cond: (tenant_id = ANY ((current_setting('rls.tenant_id'::text))::text[]))
   Heap Fetches: 0
 Planning Time: 1.600 ms
 Execution Time: 2557.392 ms
 Peak Memory Usage: 24 kB
(6 rows)
Enter fullscreen mode Exit fullscreen mode

Even if execution time is the same, casting text to an array at runtime is not optimal. Query planner can make better decisions with known values for selectivity.

Immutable PostgreSQL function

I create a function to read the session variable and cast it to an ARRAY. I define the function as immutable so that it can be executed before query planning.

yugabyte=# 
create or replace function get_tenant_id_array(setting text) returns text[] as $$
 select current_setting(setting)::text[]
$$ language sql immutable;

CREATE FUNCTION
Enter fullscreen mode Exit fullscreen mode

Now, the casting is done once before execution and the result is visible in the WHERE clause

yugabyte=# explain (costs off, analyze)
select tenant_id from demo 
where tenant_id = ANY ( get_tenant_id_array('rls.tenant_id') ) ;

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Index Only Scan using demo_tenant_id on demo (actual time=3.958..2338.873 rows=1923830 loops=1)
   Index Cond: (tenant_id = ANY ('{A,E,I,O,U}'::text[]))
   Heap Fetches: 0
 Planning Time: 1.125 ms
 Execution Time: 2454.236 ms
 Peak Memory Usage: 14 kB
(6 rows)
Enter fullscreen mode Exit fullscreen mode

This function has been defined and optimized for use in a WHERE clause.

Row Level Security policy

I am now ready to implement this as a RLS policy:

yugabyte=> create POLICY tenant_id ON demo FOR ALL TO public
    USING (
        tenant_id = ANY ( get_tenant_id_array('rls.tenant_id') )
    );

CREATE POLICY

yugabyte=> ALTER TABLE demo ENABLE ROW LEVEL SECURITY;
ALTER TABLE
Enter fullscreen mode Exit fullscreen mode

To test it, I create a user with the session parameter already set, grant SELECT on my table, and connect with this user

yugabyte=> create user franck password 'bsky-social-hhae3-zsl2a';
CREATE ROLE

yugabyte=> grant select on demo to franck;
GRANT

yugabyte=> \c - franck

psql (16.0, server 11.2-YB-2.18.3.0-b0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
You are now connected to database "yugabyte" as user "franck".

Enter fullscreen mode Exit fullscreen mode

Now, running the same query without a WHERE clause but the session parameter set to the list of tenants I want to be restricted by the RLS policy:

yugabyte=> set rls.tenant_id='{A,E,I,O,U}';
SET

yugabyte=# explain (costs off, analyze)
select tenant_id from demo;

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Index Only Scan using demo_tenant_id on demo (actual time=3.029..2302.233 rows=1923830 loops=1)
   Index Cond: (tenant_id = ANY ('{A,E,I,O,U}'::text[]))
   Heap Fetches: 0
 Planning Time: 0.178 ms
 Execution Time: 2417.699 ms
 Peak Memory Usage: 14 kB
(6 rows)
Enter fullscreen mode Exit fullscreen mode

My list of tenants was added automatically by the Row Level Security policy with the array known at query planning time.

Datatype, Array definition, and indexing

In order to cast the string of array values, you can use any datatype and change the return datatype for the function accordingly. For example, you can replace all text[] by uuid[]:

yugabyte=> alter table demo add tenant_uuid uuid unique;
ALTER TABLE

yugabyte=> create or replace function get_tenant_uuid_array(setting text) returns uuid[] as $$
 select current_setting(setting)::uuid[]
$$ language sql immutable;
CREATE FUNCTION

yugabyte=> set rls.tenant_uuid='{d1ab011c-dead-face-c0de-dec1a551f1ed,ca5caded-face-dead-c0de-c1a551f1ab1e}';
SET

yugabyte=> explain select * from demo where tenant_uuid = ANY ( get_tenant_uuid_array('rls.tenant_uuid'));
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using demo_tenant_uuid_key on demo  (cost=0.00..4.12 rows=1 width=36)
   Index Cond: (tenant_uuid = ANY ('{d1ab011c-dead-face-c0de-dec1a551f1ed,ca5caded-face-dead-c0de-c1a551f1ab1e}'::uuid[]))
(2 rows)
Enter fullscreen mode Exit fullscreen mode

I prefer using PostgreSQL array literal to define arrays. It involves enclosing the values in curly braces and separating them with a comma. Alternatively, you can use STRING_TO_ARRAY. However, it requires two arguments: the string and the delimiter.

yugabyte=> create or replace function get_tenant_uuid_array(setting text, delimiter text) returns uuid[] as $$
 select STRING_TO_ARRAY(current_setting(setting),delimiter)::uuid[]
$$ language sql immutable;
CREATE FUNCTION

yugabyte=> set rls.tenant_uuid='d1ab011c-dead-face-c0de-dec1a551f1ed ca5caded-face-dead-c0de-c1a551f1ab1e';
SET

yugabyte=> explain select * from demo where tenant_uuid = ANY ( get_tenant_uuid_array('rls.tenant_uuid',' '));

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using demo_tenant_uuid_key on demo  (cost=0.00..4.12 rows=1 width=36)
   Index Cond: (tenant_uuid = ANY ('{d1ab011c-dead-face-c0de-dec1a551f1ed,ca5caded-face-dead-c0de-c1a551f1ab1e}'::uuid[]))
(2 rows)
Enter fullscreen mode Exit fullscreen mode

Note that you cannot use ARRAY as a constructor in this case, as it takes multiple scalar values rather than a single character string that can be defined in a session variable.

That technique is optimal and can be used in both PostgreSQL and YugabyteDB. In particular, YugabyteDB has the advantage of using only a single Index Only Scan to seek and read the next rows within the five ranges, similar to a loose index scan (thanks to Hybrid Scan).

Optimized Policy For Singletons

The previous RLS policy always applies an = ANY condition. With YugabyteDB, this is optimal as one Index Scan can go to multiple ranges in one call, with an index skip scan. The order of the rows returned are those if the index. I've added value to the index, defined as (tenant_id HASH, value ASC) and this helps to use an Index Only Scan:

yugabyte=# set rls.tenant_id='{A,E,I,O,U}';
SET
yugabyte=# explain (costs off, analyze) 
           select tenant_id from demo order by value
;

                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Sort (actual time=159.815..180.212 rows=192772 loops=1)
   Sort Key: value
   Sort Method: external merge  Disk: 3416kB
   ->  Index Only Scan using demo_tenant_id on demo (actual time=3.406..83.238 rows=192772 loops=1)
         Index Cond: (tenant_id = ANY ('{A,E,I,O,U}'::text[]))
         Heap Fetches: 0
 Planning Time: 0.159 ms
 Execution Time: 219.653 ms
 Peak Memory Usage: 5768 kB
(9 rows)
Enter fullscreen mode Exit fullscreen mode

Because if scanning multiple range, an additional Sort operation was needed for order by value.

Unfortunately, if a set a singleton array, the query planner doesn't realize that there's only one range, which is sorted on value and still adds a Sort operation:

yugabyte=# set rls.tenant_id='{Y}';
SET
yugabyte=# explain (costs off, analyze) 
           select tenant_id from demo order by value
;

                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Sort (actual time=47.869..49.779 rows=38674 loops=1)
   Sort Key: value
   Sort Method: quicksort  Memory: 3179kB
   ->  Index Only Scan using demo_tenant_id on demo (actual time=1.502..36.325 rows=38674 loops=1)
         Index Cond: (tenant_id = ANY ('{Y}'::text[]))
         Heap Fetches: 0
 Planning Time: 0.156 ms
 Execution Time: 57.721 ms
 Peak Memory Usage: 3516 kB
(9 rows)
Enter fullscreen mode Exit fullscreen mode

If you expect singletons, it may be useful to handle two cases in the RLS policy:

yugabyte=# \c - yugabyte
psql (16.0, server 11.2-YB-2.19.3.0-b0)
You are now connected to database "yugabyte" as user "yugabyte".

yugabyte=# drop POLICY tenant_id ON demo;
DROP POLICY

yugabyte=# create POLICY tenant_id ON demo FOR ALL TO public
    USING (
        case
         when array_length(get_tenant_id_array('rls.tenant_id'),1) = 1
         -- equals to the first element when there's only one
         then tenant_id = (get_tenant_id_array('rls.tenant_id'))[1]
         -- use =ANY when there's more elements
         else tenant_id = ANY ( get_tenant_id_array('rls.tenant_id') )
        end
    );
CREATE POLICY

yugabyte=# \c - franck
psql (16.0, server 11.2-YB-2.19.3.0-b0)
You are now connected to database "yugabyte" as user "franck".
Enter fullscreen mode Exit fullscreen mode

The same as before is generated for multi-values array:

yugabyte=# set rls.tenant_id='{A,E,I,O,U}';
SET

yugabyte=# explain (costs off) 
           select tenant_id from demo order by value;

                          QUERY PLAN
---------------------------------------------------------------
 Sort
   Sort Key: value
   ->  Index Only Scan using demo_tenant_id on demo
         Index Cond: (tenant_id = ANY ('{A,E,I,O,U}'::text[]))
(4 rows)
Enter fullscreen mode Exit fullscreen mode

An optimized version is generated with a singleton:

yugabyte=# set rls.tenant_id='{Y}';
SET

yugabyte=# explain (costs off) 
           select tenant_id from demo order by value;

                  QUERY PLAN
----------------------------------------------
 Index Only Scan using demo_tenant_id on demo
   Index Cond: (tenant_id = 'Y'::text)
(2 rows)
Enter fullscreen mode Exit fullscreen mode

With this new policy, the Sort operation is avoided when not needed.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .