The features discussed in previous posts of this series are comparable to other database-sharding solutions for PostgreSQL, such as Citus. We have noted similar limitations regarding cross-shard performance and features. However, there is one significant difference: in Citus, cross-shard reads provide eventual consistency (see an example in How ACID is Citus?). In contrast, cross-shard reads in Aurora Limitless are strongly consistent.
Let's demonstrate the same as in How ACID is Citus?, with the PgBench tables created in the previous post.
I set the balance of all accounts to zero:
postgres_limitless=> update pgbench_accounts set abalance=0;
UPDATE 100000
postgres_limitless=> select count(*),min(abalance),max(abalance),sum(abalance)
from pgbench_accounts;
count | min | max | sum
--------+-----+-----+-----
100000 | 0 | 0 | 0
(1 row)
I'll run the following script that transfers money from one account to another, so that the sum of all balances do not change:
cat > /tmp/pgbench-acid.sql <<'SQL'
\set aid1 random(1, 100000 * :scale)
\set aid2 random(1, 100000 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid1;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid = :aid2;
END;
SQL
I initially ran the script before updating the balance, but encounter a distributed lock error:
\! pgbench -f /tmp/pgbench-acid.sql -T 300 -c 10 &
postgres_limitless=> update pgbench_accounts set abalance=0`
;
ERROR: aborting transaction participating in a distributed deadlock
This is different than PostgreSQL where you don't expect such retryable error in Read Committed isolation.
To workaround this, I locked the table before updating it:
begin transaction;
lock table pgbench_accounts in share mode;
update pgbench_accounts set abalance=0;
commit;
Once the balance reset, and running the transfer transaction in the background, I query the total balance:
select count(*),min(abalance),max(abalance),sum(abalance)
from pgbench_accounts;
\watch 0.1
The sum always remains the same, which is proof that it reads a consistent snapshot:
Aurora Limitless uses the wall clock to compare read times with commit times, ensuring consistent read snapshots across multiple shards. Since the clocks in each shard can drift, it calculates the clock skew to guarantee consistency. This approach is similar to that used by YugabyteDB. Running on AWS allows for precise time synchronization, thanks to Time Sync, which can provide a bounded clock time. I have detailed this in Achieving Precise Clock Synchronization on AWS.
With this, Aurora Limitless provides the same transaction semantics as PostgreSQL for Read Committed and Repeatable Read. Serializable is not supported as it would require a distributed lock manager for the predicate locks. The reads from the router get a consistent snapshot from all shards. The writes use a two-phase commit (2PC) with automatic recovery of pending transactions in case of failover.