Find hotspots with Yugabyte Active Session History

Franck Pachot - Jul 20 - - Dev Community

In a previous post, I explained that the default PgBench workload is not scalable, with all sessions updating the same row, and how to solve it:

This table row is a hotspot in a distributed database. In the following article, I demonstrate how to detect hotspots in YugabyteDB using Active Session History.

I launch a single-node YugabyteDB cluster with my Active Session History dashboard, using my experimental docker-compose:


git clone git@github.com:FranckPachot/yb-perf-hub.git
cd yb-perf-hub

# Start a RF1 cluster
docker compose -f docker-compose-startyb.yaml up -d  yugabytedb --scale yugabytedb=1 --no-recreate

# Start Grafana dashboard to connect to it
sed -e '$a'"ip_of_yugabytedb_database=$(docker compose -f docker-compose-startyb.yaml exec yugabytedb hostname -i)"  .env > .lab.env
docker compose --env-file=.lab.env up -d

Enter fullscreen mode Exit fullscreen mode

I initialize PgBench and run the default workload from ten clients:


docker compose -f docker-compose-startyb.yaml run -e PGPASSWORD=yugabyte yugabytedb /home/yugabyte/postgres/bin/ysql_bench -h yb-perf-hub-yugabytedb-1 -i


docker compose -f docker-compose-startyb.yaml run -e PGPASSWORD=yugabyte yugabytedb /home/yugabyte/postgres/bin/ysql_bench -h yb-perf-hub-yugabytedb-1 -n -c 10 -T 3600 -P 5 --max-tries 100

Enter fullscreen mode Exit fullscreen mode

Here is what I can see from my Grafana dashboard, all based on yb_active_session_history, pg_stat_statements, and yb_local_tablets

YSQL: Query Processing

Image description

TServer: Conflicting Transactions

Image description

Statements: UPDATE tellers

Image description

Image description

Tablets: tellers

Image description

Image description

The dashboard shows different dimensions separately: wait events, statements, and tablets. However, Active Session History can do better because it links all of them in each sample. In the following query, the samples are grouped based on these dimensions to identify hotspots with comprehensive information:

docker compose -f docker-compose-startyb.yaml run -e PGPASSWORD=yugabyte yugabytedb ysqlsh -h yb-perf-hub-yugabytedb-1 -xc "

select sum(sample_weight),
 wait_event_component, wait_event_type, wait_event, wait_event_class, client_node_ip, query, table_type, namespace_name, ysql_schema_name, table_name, partition_key_start, partition_key_end
from yb_active_session_history as h
natural left outer join (
 select queryid as query_id,
 query from pg_stat_statements
) as statements
natural left outer join (
 select substr(tablet_id,1,15) as wait_event_aux,
 table_type, namespace_name, ysql_schema_name, table_name, partition_key_start, partition_key_end from yb_local_tablets
) as tablets
group by
 wait_event_component, wait_event_type, wait_event, wait_event_class, client_node_ip, query, table_type, namespace_name, ysql_schema_name, table_name, partition_key_start, partition_key_end
order by sum(sample_weight) desc fetch first 1 rows only

"

Enter fullscreen mode Exit fullscreen mode

Here is the sample with the highest count:

-[ RECORD 1 ]--------+----------------------------------------------------------------------
sum                  | 12779
wait_event_component | TServer
wait_event_type      | WaitOnCondition
wait_event           | ConflictResolution_WaitOnConflictingTxns
wait_event_class     | TabletWait
client_node_ip       | 0.0.0.0:0
query                | UPDATE ysql_bench_tellers SET tbalance = tbalance + $1 WHERE tid = $2
table_type           | YSQL
namespace_name       | yugabyte
ysql_schema_name     | public
table_name           | ysql_bench_tellers
partition_key_start  |
partition_key_end    |

Enter fullscreen mode Exit fullscreen mode

All the information is there. The hotspot is on ysql_bench_tellers (with no partition key start/end because I have a single tablet here), waiting on ConflictResolution_WaitOnConflictingTxns in the tablet server when executing UPDATE ysql_bench_tellers.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .