⚠️ Do not run this test on a billed account. It reads a table multiple times, and Aurora I/O can be extremely expensive.
In a previous post I investigated buffer caching in Oracle Cloud PostgreSQL managed service. Now, let's do the same with Amazon Aurora.
I created an AWS db.r6g.2xlarge
instance with 8 vCPU (4 Intel cores) and 64GB RAM, with PostgreSQL 14.9 compatibility, in eu-west-1
:
The default configuration is:
postgres=> select aurora_version();
aurora_version
----------------
14.9.1
(1 row)
postgres=> \dconfig shared_buffers
List of configuration parameters
Parameter | Value
----------------+---------
shared_buffers | 5475728
(1 row)
postgres=> \dconfig effective_cache_size
List of configuration parameters
Parameter | Value
----------------------+------------
effective_cache_size | 43805824kB
(1 row)
The shared_buffers parameter is set to 5475728 blocks, which amounts to 41GB. The effective_cache_size has the same value as shared_buffers. This indicates that Amazon Aurora exclusively depends on PostgreSQL shared buffers in the database instance, without relying on a filesystem cache. This is understandable given the architecture of Amazon Aurora, which does not write to local files. Although the shared buffer cache is updated, it is never checkpointed to disk. Whenever there is a cache miss, the remote storage is accessed. It receives all Write-Ahead Logging (WAL), and applies it on the storage servers. Then it can provide the required version of a block that is not found in the PostgreSQL buffer cache.
Here is the script I used in the previous blog pos, scanning a table that increases by 100MB increments:
psql -e <<'SQL'
\timing on
drop table if exists large;
create table large ( filler text ) with (fillfactor=10);
insert into large
select string_agg( chr((32+random()*94)::int) ,'' )
from generate_series(1,1000) c, generate_series(1,12800) r
group by r
\; select 'size', pg_table_size('large') , pg_size_pretty( pg_table_size('large') ) , now()
\; explain (costs off, analyze, buffers) select * from large
\; explain (costs off, analyze, buffers) select * from large
\; explain (costs off, analyze, buffers) select * from large
\watch i=120 c=1000
SQL
⚠️ To avoid excessive I/O charges, I recommend running with larger increments than my 100MB precision.
Here is the read throughput for each size:
One observation is similar to the previous post that runs a fork of PostgreSQL that is very similar to the community PostgreSQL: once we read more than what fits in the shared_buffers
, 41GB here, the throughput drops. But it drops directly to the throughput of the network storage, as there is no filesystem cache involved here.
I gathered the explain (analyze, buffers)
around this point where we start to see some buffer reads in addition to shared hits:
Tue 28 Nov 2023 03:18:08 AM GMT (every 120s)
?column? | pg_table_size | pg_size_pretty | now
----------+---------------+----------------+-------------------------------
size | 44786343936 | 42 GB | 2023-11-28 03:18:08.475725+00
(1 row)
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on large (actual time=0.007..4136.013 rows=5465600 loops=1)
Buffers: shared hit=5465600
Planning Time: 0.031 ms
Execution Time: 4479.278 ms
(4 rows)
Tue 28 Nov 2023 03:20:08 AM GMT (every 120s)
?column? | pg_table_size | pg_size_pretty | now
----------+---------------+----------------+-------------------------------
size | 44891226112 | 42 GB | 2023-11-28 03:20:08.474894+00
(1 row)
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on large (actual time=0.006..4775.911 rows=5478400 loops=1)
Buffers: shared hit=5465327 read=13073
I/O Timings: read=174.088
Planning Time: 0.033 ms
Execution Time: 5120.147 ms
(5 rows)
The throughput at that time is 5465600 / 4136.013 = 1321 pages read per millisecond.
There's no further big drop as the filesystem cache is not used, all read
are from the storage servers. However, there's an interesting step when we scan 62GB:
Tue 28 Nov 2023 10:14:19 AM GMT (every 120s)
?column? | pg_table_size | pg_size_pretty | now
----------+---------------+----------------+-------------------------------
size | 66392915968 | 62 GB | 2023-11-28 10:14:19.480057+00
(1 row)
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on large (actual time=332.310..41903.010 rows=8102400 loops=1)
Buffers: shared hit=5467256 read=2635144
I/O Timings: read=33838.515
Planning Time: 0.032 ms
Execution Time: 42420.378 ms
(5 rows)
Tue 28 Nov 2023 10:16:34 AM GMT (every 120s)
?column? | pg_table_size | pg_size_pretty | now
----------+---------------+----------------+-------------------------------
size | 66497798144 | 62 GB | 2023-11-28 10:16:34.683262+00
(1 row)
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on large (actual time=332.333..70424.197 rows=8115200 loops=1)
Buffers: shared hit=5467258 read=2647942
I/O Timings: read=62311.966
Planning Time: 0.033 ms
Execution Time: 70943.117 ms
(5 rows)
An additional 100MB to scan has decreased the time from 41903 to 70424 millisecond. If I look at read
and I/O Timings
, this is going from 2635144/33838.515=78 pages per millisecond to 2647942/62311.966=42 pages per millisecond.
Something happened around when the table was above 62GB. Here as some metrics to try to understand.
Active Session History and Wait Events:
Aurora Storage Read Latency:
The read latency is 2.2 milliseconds until 10:00 when the table size was smaller than 62GB and then increased to reach 3.3 milliseconds at 12:00 when the table is 64GB.
Those are just observations. I have no explanations. Is there still some double-buffering effect with a filesystem cache on the storage server?
I stopped it for a few hour to let the storage cells cool down, and maybe let some compaction reduce the size of the Copy-on-Write filesystem, and re-started it. It was back to 2.3 millisecond latency quickly after inserting a few Gigabytes:
awk '
/every/{ d=$5" "$6 }
{gsub(/=/," ");print}
/Buffers:/{r=$NF ; s=($4 + $NF)*8/1024/1024}
/I[/]O Timings:/{t=$NF ; printf "\n--> %6.2f GB %6.2f pages/ms Buffers read=%s I/O Timings: read=%s %s\n\n", s, r/t,r,t,d }
' log-apg.txt | grep -- "^--> "
--> 97.85 GB 76.68 pages/ms Buffers read=7358466 I/O Timings: read=95968.147 04:57:14 PM
--> 97.85 GB 76.68 pages/ms Buffers read=7358466 I/O Timings: read=95961.648 04:57:14 PM
--> 97.85 GB 76.32 pages/ms Buffers read=7358425 I/O Timings: read=96415.408 04:57:14 PM
...
--> 99.22 GB 75.93 pages/ms Buffers read=7537544 I/O Timings: read=99271.602 06:17:58 PM
--> 99.22 GB 76.18 pages/ms Buffers read=7537544 I/O Timings: read=98943.058 06:17:58 PM
--> 99.32 GB 76.45 pages/ms Buffers read=7550345 I/O Timings: read=98767.120 06:23:51 PM
--> 99.32 GB 76.49 pages/ms Buffers read=7550345 I/O Timings: read=98705.841 06:23:51 PM
--> 99.32 GB 76.46 pages/ms Buffers read=7550345 I/O Timings: read=98754.641 06:23:51 PM
--> 99.41 GB 69.18 pages/ms Buffers read=7571397 I/O Timings: read=109438.664 06:29:41 PM
--> 99.41 GB 40.49 pages/ms Buffers read=7566829 I/O Timings: read=186884.136 06:29:41 PM
--> 99.41 GB 43.34 pages/ms Buffers read=7564209 I/O Timings: read=174524.194 06:29:41 PM
--> 99.51 GB 42.19 pages/ms Buffers read=7577001 I/O Timings: read=179585.400 06:38:25 PM
--> 99.51 GB 40.36 pages/ms Buffers read=7576994 I/O Timings: read=187734.618 06:38:25 PM
Then I stopped again, 5 hours and started only the selects without increasing the size:
I have noticed a consistent 2.3 millisecond latency for a while, followed by a 3.7 millisecond latency. I am unsure of the reason for this inconsistency, and it highlights the limitations of my current approach. In order to verify the cache efficiency with user queries, one must have a good understanding of how it works. PostgreSQL and Linux are open source and have accessible documentation. However, RDS Aurora's storage system is more complex and has many undocumented layers in the compute and storage instances, making it a black box for the users where it is difficult to have explanation for the performance we observe.