Aurora Limitless - Sequences

Franck Pachot - Nov 30 - - Dev Community

Sequences are commonly used in SQL to generate unique numbers of small size and clustered to minimize the scattering of rows. However, managing sequences at scale in a distributed database can be difficult, particularly when sharding is involved. Let's examine how this is implemented in Aurora Limitless.

I have a two-shard Aurora Limitless cluster:

postgres_limitless=> select * from rds_aurora.limitless_subclusters order by 1;
 subcluster_id | subcluster_type
---------------+-----------------
 2             | router
 3             | router
 4             | shard
 5             | shard
(4 rows)
Enter fullscreen mode Exit fullscreen mode

I have created the PgBench tables as described in the previous post. "pgbench_history" has no primary key:

postgres_limitless=> \d pgbench_history
              Partitioned table "public.pgbench_history"
 Column |            Type             | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
 tid    | integer                     |           |          |
 bid    | integer                     |           |          |
 aid    | integer                     |           |          |
 delta  | integer                     |           |          |
 mtime  | timestamp without time zone |           |          |
 filler | character(22)               |           |          |
Partition key: HASH (aid)
Number of partitions: 2 (Use \d+ to list them.)
Enter fullscreen mode Exit fullscreen mode

Let's try to add a sequence as a primary key. I can create the sequence explicitly or implicitly with the BIGSERIAL datatype:

postgres_limitless=> alter table pgbench_history add id bigserial;
ALTER TABLE
Enter fullscreen mode Exit fullscreen mode

The definition of the sequence from the PostgreSQL catalog shows a max value of 250000:

postgres_limitless=> \x
Expanded display is on.
postgres_limitless=> select * from pg_sequences where schemaname='public';
-[ RECORD 1 ]-+-----------------------
schemaname    | public
sequencename  | pgbench_history_id_seq
sequenceowner | postgres
data_type     | bigint
start_value   | 1
min_value     | 1
max_value     | 250000
increment_by  | 1
cycle         | f
cache_size    | 1
last_value    |
Enter fullscreen mode Exit fullscreen mode

Keep in mind that I'm connected to a router, and the PostgreSQL catalog views do not provide a complete picture.

The Aurora Limitless global view provides a definition for the entire cluster.

postgres_limitless=> select * from rds_aurora.limitless_global_sequence_view;
 schema_name |     sequence_name      | lastval | minvalue |      maxvalue       | start | inc | cache
-------------+------------------------+---------+----------+---------------------+-------+-----+-------
 public      | pgbench_history_id_seq | 1000000 |        1 | 9223372036854775807 |     1 |   1 |     1
(1 row)
Enter fullscreen mode Exit fullscreen mode

The max value here is what PostgreSQL would have set - the maximum of a bigint. However, the last value is relatively high - one million.

Another Aurora Limitless view shows more info:

postgres_limitless=> select * from rds_aurora.limitless_sequence_metadata_view limit 5;
-[ RECORD 1 ]-----+-----------------------
subcluster_id     | 2
sequence_name     | pgbench_history_id_seq
schema_name       | public
active_minvalue   | 1
active_maxvalue   | 250000
reserved_minvalue | 500001
reserved_maxvalue | 750000
chunk_size        | 250000
chunk_state       | 1
local_last_value  |
-[ RECORD 2 ]-----+-----------------------
subcluster_id     | 3
sequence_name     | pgbench_history_id_seq
schema_name       | public
active_minvalue   | 250001
active_maxvalue   | 500000
reserved_minvalue | 750001
reserved_maxvalue | 1000000
chunk_size        | 250000
chunk_state       | 1
local_last_value  |
Enter fullscreen mode Exit fullscreen mode

Subclusters 2 and 3 function as the routers, indicating that there is a sequence for each router rather than for each shard. These sequences have a chunk size of 250000, which is used to reserve ranges of values on each router. The router identified by subcluster_id 2 has an active range of values from 1 to 250000. The router with subcluster_id 3 has an active range from 250001 to 500000.

Aurora Limitless utilizes this metadata to create a PostgreSQL sequence on each router, ensuring that the ranges do not overlap. For instance, I have observed that the router I am connected to has a sequence with a minimum value of 1 and a maximum value of 250000, which corresponds to the definition of the router with subcluster_id 2.

I can confirm my assumption by querying the local version of the global view:

postgres_limitless=> select * from rds_aurora.limitless_local_sequence_metadata_view;
-[ RECORD 1 ]-----+-----------------------
subcluster_id     | 2
sequence_name     | pgbench_history_id_seq
schema_name       | public
active_minvalue   | 1
active_maxvalue   | 250000
reserved_minvalue | 500001
reserved_maxvalue | 750000
chunk_size        | 250000
chunk_state       | 1
local_last_value  |
Enter fullscreen mode Exit fullscreen mode

A parameter determines the size of each chunk:

postgres_limitless=> show rds_aurora.limitless_sequence_chunk_size;
 rds_aurora.limitless_sequence_chunk_size
------------------------------------------
 250000
(1 row)
Enter fullscreen mode Exit fullscreen mode

I run PgBench transactions that add a row in "pgbench_history" with "mtime" from the wall clock:

postgres_limitless=> \! pgbench -c 10 -P 15 -t 100000 -r -f simple-update.sql
pgbench (16.2, server 16.4)
starting vacuum...end.
progress: 15.0 s, 4.1 tps, lat 790.825 ms stddev 228.164, 0 failed
progress: 30.0 s, 12.9 tps, lat 780.043 ms stddev 202.696, 0 failed
progress: 45.0 s, 12.2 tps, lat 822.944 ms stddev 239.396, 0 failed
progress: 60.0 s, 12.7 tps, lat 784.624 ms stddev 217.702, 0 failed
progress: 75.0 s, 13.0 tps, lat 768.676 ms stddev 216.321, 0 failed
...
Enter fullscreen mode Exit fullscreen mode

The ten connections are running in from multiple routers. Each router is assigning the next value from its sequence within its reserved range
:

postgres_limitless=> select * from pgbench_history 
                     order by mtime limit 10
;
 tid | bid |  aid  | delta |           mtime            | filler |   id
-----+-----+-------+-------+----------------------------+--------+--------
   1 |   1 | 33344 |  3767 | 2024-11-30 13:59:36.904277 |        |      2
   5 |   1 | 53196 |  3601 | 2024-11-30 13:59:36.90429  |        | 250005
   9 |   1 | 28785 | -1508 | 2024-11-30 13:59:36.904318 |        | 250003
   5 |   1 | 21155 |   -74 | 2024-11-30 13:59:36.904355 |        |      1
   4 |   1 | 18697 |  4201 | 2024-11-30 13:59:36.904436 |        | 250001
   6 |   1 | 66855 | -2115 | 2024-11-30 13:59:36.904513 |        | 250004
   8 |   1 | 82644 |  1249 | 2024-11-30 13:59:36.904715 |        |      3
   5 |   1 | 92352 |  4698 | 2024-11-30 13:59:36.904855 |        | 250002
   3 |   1 |  1526 |  2719 | 2024-11-30 13:59:36.904982 |        | 250006
   8 |   1 |  1474 |  4422 | 2024-11-30 13:59:36.905173 |        | 250007
(10 rows)
Enter fullscreen mode Exit fullscreen mode

I mentioned that I wanted to make this "id" column the primary key for the table, but this is not possible because the table is shared on another column:

postgres_limitless=> alter table pgbench_history add primary key(id);
ERROR:  PRIMARY KEY constraint on sharded table must include the shard key
DETAIL:  PRIMARY KEY constraint on table "pgbench_history" lacks column "aid" which is part of the shard key.
Enter fullscreen mode Exit fullscreen mode

I can add the sharding key to the primary key definition:

postgres_limitless=> alter table pgbench_history add primary key(id,aid);
ALTER TABLE
Enter fullscreen mode Exit fullscreen mode

This creates an index for fast access by an ID. In theory, there should not be duplicates thanks to the reserved sequence ranges, but the shards do not verify this.

Note that it is impossible to set a chunk size of 1 to try to get an ordered sequence:

postgres_limitless=> set rds_aurora.limitless_sequence_chunk_size to 1;
SET
postgres_limitless=> alter table pgbench_history add id bigserial;
ERROR:  MINVALUE (1) must be less than MAXVALUE (1)
Enter fullscreen mode Exit fullscreen mode

Sequences should not be utilized to generate no-gap numbers. They are used to generate unique numbers in a scalable manner. Aurora Limitless achieves this by implementing sequences in the routers and utilizing reserved ranges. This approach is analogous to sequence caching, which sets aside ranges of numbers. PostgreSQL caches sequence ranges for each connection, while YugabyteDB can cache them per connection or per server node. In contrast, Aurora Limitless employs the PostgreSQL cache within a single router and incorporates an additional mechanism to reserve larger ranges for the routers.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .