In YugabyteDB, yb_stats takes all available statistics and shows these (in ad-hoc mode) or saves these for later investigation (in snapshot mode). One of the topics that can be investigated, is memory and memory usage.
It's important to realise memory is visible from 3 different sources, and a fourth important source doesn't externalise it's memory usage: YSQL alias PostgreSQL.
- Node exporter: Node exporter shows total machine memory statistics. That means that the node exporter view is a superset of information, of which the tablet server and master memory usage can be part of.
- Tablet server: A tablet server shows statistics at two levels: at the level of the memory allocator, tcmalloc, and memory allocations tracked by the memtracker framework. This excludes operating system level usage, like mapped executable and libraries.
- Master server: A master server shows statistics at two levels: at the level of the memory allocator, tcmalloc, and memory allocations tracked by the memtracker framework. This excludes operating system level usage, like mapped executable and libraries.
- YSQL (PostgreSQL): The YSQL layer consists of pretty similar processes as PostgreSQL. For these, there is no general view that shows the YSQL only memory usage, just like this is not available in PostgreSQL. Newer versions of PostgreSQL have a heap function, and newer versions of YSQL do provide similar functions, but these are per process, and not externalised.
node exporter basic memory statistics
To see the basic operating system level memory statistics, the --gauges-enable
flag must be used, because node exporter exports the memory statistics as such, and filter on node_memory_Mem
. This is how that looks like:
➜ yb_stats --gauges-enable --stat-name-match 'node_memory_(Mem|Swap)' --hostname-match 80:9300
Begin ad-hoc in-memory snapshot created, press enter to create end snapshot for difference calculation.
Time between snapshots: 1.476 seconds
192.168.66.80:9300 gauge node_memory_MemAvailable_bytes 1433604096.000000 -151552
192.168.66.80:9300 gauge node_memory_MemFree_bytes 896311296.000000 -151552
192.168.66.80:9300 gauge node_memory_MemTotal_bytes 1900789760.000000 +0
192.168.66.80:9300 gauge node_memory_SwapFree_bytes 2206199808.000000 +0
192.168.66.80:9300 gauge node_memory_SwapTotal_bytes 2206199808.000000 +0
These are the statistics for a single node, in a cluster you would typically have more nodes. These statistics come from the linux /proc
meta-filesystem, and the source is /proc/meminfo
.
Description:
- MemTotal: this is the total amount of memory available to the operating system, and therefore does not fluctuate.
- MemFree: this is a statistic that hardly ever is useful, because this a minimal amount of memory that the kernel keeps free (set by
vm.min_free_kbytes
). During the machine lifetime, MemFree will lower until it reachesvm.min_free_kbytes
, and then the kernel tries to keep it at that number. This is not an indicator of available memory. - MemAvailable: this is the most useful memory statistic of all: this statistic tells the amount of memory that the kernel considers to be available for immediate use. This number can fluctuate, but consistent low values means too much memory allocated. Low is approximately lower than 5%.
- Swap: These are useful to understand if swap is in use, and if so, how much of the swap is used.
node exporter detailed memory statistics
The /proc/meminfo
meta-file contains much more statistics. However, the gathering of figures in /proc/meminfo
is quite diverse: there are several statistics that have in common that they are all about memory usage, but are independent groups of statistics. That also makes it hard to understand.
However, using the statistics in /proc/meminfo
we can take some statistics that tell something.
anonymous and cached
➜ yb_stats --hostname-match 80:9300 --adhoc-metrics-diff --gauges-enable --stat-name-match 'memory_(MemAvailable|Cached|MemTotal|AnonPages|MemFree|Swap)'
Begin ad-hoc in-memory metrics snapshot created, press enter to create end snapshot for difference calculation.
Time between snapshots: 1.864 seconds
192.168.66.80:9300 gauge node_memory_AnonPages_bytes 351571968.000000 +3092480
192.168.66.80:9300 gauge node_memory_Cached_bytes 805695488.000000 +0
192.168.66.80:9300 gauge node_memory_MemAvailable_bytes 1219727360.000000 -3207168
192.168.66.80:9300 gauge node_memory_MemFree_bytes 552243200.000000 -3207168
192.168.66.80:9300 gauge node_memory_MemTotal_bytes 1900789760.000000 +0
192.168.66.80:9300 gauge node_memory_SwapFree_bytes 2206199808.000000 +0
192.168.66.80:9300 gauge node_memory_SwapTotal_bytes 2206199808.000000 +0
This is how to look at these figures:
alloc | size |
---|---|
MemTotal | 1900M |
MemFree | 552M |
Cached | 806M |
Anonymous | 352M |
%others% | 190M |
MemAvailable | 1220M |
- MemTotal is what is available to Linux.
- MemFree is what is actually free.
- Cached is all of file backed memory (including in use).
- Anonymous is uniquely allocation, mostly in use.
- Others: MemTotal-(MemFree+Cache+Anonymous) leaves around 300M for kernel allocations and others.
- The kernel thinks it can make 1003M available for use without the need to involve paging to the swap device.
These are dynamic classifications. The total memory size is 805M, and with the above distribution of memory, there is 382M available (from free, but also other memory that can be 'repurposed').
YSQL allocations
Now let's see how that works by making PostgreSQL allocate an array in PLpgSQL. The code for that:
set my.size to 1020;
set my.count to 1000000;
do
$$
declare
array text[];
counter int:= current_setting('my.count',true);
size int:= current_setting('my.size',true);
begin
raise info 'Pid: %', pg_backend_pid();
raise info 'Array element size: %, count: %', size, counter;
for count in 1..counter loop
array[count]:=repeat('x',size);
end loop;
raise info 'done!';
perform pg_sleep(60);
end
$$;
Execute it in the following way:
- Logon to PostgreSQL (psql)/YugabyteDB (ysqlsh)
- Execute
yb_stats --hostname-match 192.168.66.80:9300 --adhoc-metrics-diff --gauges-enable --stat-name-match 'memory_(MemAvailable|Cached|MemTotal|AnonPages|MemFree)'
, and wait for the message to indicate it has taken the begin snapshot:Begin ad-hoc in-memory metrics snapshot created, press enter to create end snapshot for difference calculation.
- Execute the above anonymous PLpgSQL procedure with the memory counter adjusted for your available memory, and wait until it says 'done!'.
- Press enter in the terminal that yb_stats has made it's first in-memory snapshot, so it takes another snapshot and shows the difference.
This is what it shows for me:
➜ yb_stats --hostname-match 80:9300 --adhoc-metrics-diff --gauges-enable --stat-name-match 'memory_(MemAvailable|Cached|MemTotal|AnonPages|MemFree|Swap)'
Begin ad-hoc in-memory metrics snapshot created, press enter to create end snapshot for difference calculation.
Time between snapshots: 12.998 seconds
192.168.66.80:9300 gauge node_memory_AnonPages_bytes 1452355584.000000 +1057705984
192.168.66.80:9300 gauge node_memory_Cached_bytes 187269120.000000 -620449792
192.168.66.80:9300 gauge node_memory_MemAvailable_bytes 150810624.000000 -1025363968
192.168.66.80:9300 gauge node_memory_MemFree_bytes 89874432.000000 -416792576
192.168.66.80:9300 gauge node_memory_MemTotal_bytes 1900789760.000000 +0
192.168.66.80:9300 gauge node_memory_SwapCached_bytes 7012352.000000 +7012352
192.168.66.80:9300 gauge node_memory_SwapFree_bytes 2154631168.000000 -51568640
192.168.66.80:9300 gauge node_memory_SwapTotal_bytes 2206199808.000000 +0
- The approximate size for allocation for the anonymous PLpgSQL procedure is 1000000*1048=1,048,000,000. That is close to +1057705984. It clear the array is allocated from anonymous memory.
- Most of the memory allocated to anonymous memory is removed from available memory.
- 620M is taken from Cached, 417M is taken from Free memory.
- 52M is paged out to the swap device.
The conclusion is that memory usage by YSQL processes mainly taken from Anonymous memory.
Tablet server allocations
What about the YugabyteDB tablet server?
For this, I created a small configurable PLpgSQL procedure with variables:
set my.tables to 8;
set my.rows to 4000;
set my.size to 1020;
set my.tablets to 1;
do
$$
declare
tables int:= current_setting('my.tables',true);
rows int:= current_setting('my.rows',true);
size int:= current_setting('my.size',true);
tablets int:= current_setting('my.tablets',true);
begin
--
raise info 'Pid: %', pg_backend_pid();
raise info 'Nr. tables: %, rows: %, textsize: %', tables, rows, size;
raise info 'rowsize: %, total size: %',
pg_column_size(1)+pg_column_size(repeat('x',size)),
rows*(pg_column_size(1)+pg_column_size(repeat('x',size)));
for table_counter in 1..tables loop
raise info 'table: %/%', table_counter, tables;
execute format('drop table if exists table%s cascade', table_counter);
execute format('create table table%s (id int primary key, f1 text) split into %s tablets', table_counter, tablets);
for row_counter in 1..rows loop
execute format('insert into table%s (id, f1) values (%s, ''%s'')', table_counter, row_counter, repeat('x',size));
end loop;
end loop;
raise info 'Done.';
end
$$;
What this does is quite naively create tables and insert data into it (this is not highly optimised code).
However, the purpose of the above code is to create memtables and fill these up, because these will increase the tablet server memory footprint.
The results below are from a small 3-node YugabyteDB RF3 cluster, which means that despite the leaders of the tablets being distributed over the cluster nodes, each cluster node will get either a leader or a follower of a tablet.
I ran this after having created a new cluster on a freshly started first node, with an in-memory snapshot having been taken in this way, which is identical to the earlier YSQL level testcase:
yb_stats --hostname-match 80:9300 --adhoc-metrics-diff --gauges-enable --stat-name-match 'memory_(MemAvailable|Cached|MemTotal|AnonPages|MemFree|Swap)'
After the procedure has run and created and filled 8 tables, press enter to show the difference:
➜ ./target/release/yb_stats --hostname-match 80:9300 --adhoc-metrics-diff --gauges-enable --stat-name-match 'memory_(MemAvailable|Cached|MemTotal|AnonPages|MemFree|Swap)'
Begin ad-hoc in-memory snapshot created, press enter to create end snapshot for difference calculation.
Time between snapshots: 138.111 seconds
192.168.66.80:9300 gauge node_memory_AnonPages_bytes 306294784.000000 +145600512
192.168.66.80:9300 gauge node_memory_Cached_bytes 691851264.000000 +76595200
192.168.66.80:9300 gauge node_memory_MemAvailable_bytes 1310580736.000000 -152317952
192.168.66.80:9300 gauge node_memory_MemFree_bytes 759713792.000000 -229179392
192.168.66.80:9300 gauge node_memory_MemTotal_bytes 1900789760.000000 +0
192.168.66.80:9300 gauge node_memory_SwapFree_bytes 2206199808.000000 +0
192.168.66.80:9300 gauge node_memory_SwapTotal_bytes 2206199808.000000 +0
- The server did not perform any paging, which can be seen by the non-changed Swap statistics.
- Memory Available decreased, which is logical, because the tablet server allocates memory for the memtables.
- The memory was taken from Free Memory, which is also logical, because just after startup, the tablet server and other processes haven not paged in a lot of memory yet.
- The Cached Memory statistic increased somewhat. This is because the new tables have their memtables written in memory only, but still actual writes are performed for the WAL to guarantee persistency, and these will need cached pages.
- The amount of Anonymous Memory did increase the most. This is the tablet server increasing its memory to facilitate the memtables mostly, and all kinds of surrounding allocations with it.
Conclusion
In most cases, whenever YSQL processes or the tablet servers get active, they can allocate memory. The memory that is allocated is mostly Anonymous memory. Therefore, the anonymous memory statistic is a good indicator of YSQL and tablet server memory usage.
Please mind that because YugabyteDB relies on operating system caching for IO, just like PostgreSQL does, it needs to have a reasonable amount of 'Cached' memory too. The hard part is to understand what 'reasonable amount' is; however: there needs to be an amount that is "considerable" to function as a buffer for reads and writes.