Quick ๐Ÿ“Š on ๐Ÿ˜ active SQL from pg_stat_activity

Franck Pachot - Jan 14 '22 - - Dev Community

pg_stat_statements can identify long queries, but are limited when we need to go further: the queries are normalized with values replaced by parameters. And the metrics gathered are cumulative, about time, with min, max, average, stddev, usually over a long time window. It is hard to catch the statements responsible for the activity at peak load, and when caught, not easy to reproduce the execution plan, know the user, or the application that ran it.

pg_stat_activity holds this information and here is small script to sample it. The goal of sampling the running statements is to start the investigation by looking at the top queries. However, they are all different, with different parameters. What I'm doing here is aggregate the ones sharing the same execution plan. For each active queries in pg_stat_activity I run an explain and remove the parts with values (Filter and Index Conditions). Then I take a md5 of it and this will become my "Plan Hash Value".

The function ignores the statements that cannot be explained, so this works only when the commands are single query.

Sampling function

This installs my sampling function and the table to store the plans:

drop table if exists yb_ash_plans;
drop function if exists yb_ash_sample;

create table if not exists yb_ash_plans(
 plan_hash_value text primary key, 
 samples int, running_for interval,last_plan text,last_pg_stat_activity jsonb
 );
create or replace function yb_ash_sample() returns int as
 $$
declare
 psa record;
 plan_xml text;
 plan_nofilter text;
 plan_md5 text;
 last_running_for interval;
 count_updated int:=0;
begin
  for psa in (select * from pg_stat_activity
   where state='active' and pid!=pg_backend_pid()) loop
   begin 
    --raise notice 'SQL %',psa.query;
    execute format('explain (format xml, costs false, buffers false) %s',psa.query) into plan_xml;
    exception when others then
     if sqlstate not like '42%' then raise warning '% % %',sqlstate,sqlerrm,psa.query; end if;
    end;
   end loop;
   if plan_xml is not null then
    --raise notice 'XML %',plan_xml;
    plan_nofilter:=regexp_replace(plan_xml,'(<Item>|<Filter>|<Index-Cond>|<Recheck-Cond>|<Join-Filter>|<Merge-Cond>|<Hash-Cond>)[^<]*(</)','\1\2','g'); 
    plan_md5=md5(plan_nofilter);
    --raise notice e'PHV   %\n%\n%',plan_md5,plan_xml,plan_nofilter;
    update yb_ash_plans set samples=samples+1 where plan_hash_value=plan_md5 returning running_for into last_running_for;
    --raise notice e'LRF   % <> %',last_running_for,clock_timestamp()-psa.query_start;
    if last_running_for is null then
      insert into yb_ash_plans(plan_hash_value,samples,running_for,last_plan,last_pg_stat_activity)
       values(plan_md5,1,clock_timestamp()-psa.query_start,plan_xml,row_to_json(psa));
      count_updated:=count_updated+1;
     else
      if last_running_for < clock_timestamp()-psa.query_start then
       update yb_ash_plans
        set running_for=clock_timestamp()-psa.query_start,
        last_plan=plan_xml, last_pg_stat_activity=row_to_json(psa)
        where plan_hash_value=plan_md5;
       count_updated:=count_updated+1;
      end if;
     end if;
   end if;
  return count_updated; 
 end;
$$ language plpgsql;

Enter fullscreen mode Exit fullscreen mode

Calling the yb_ash_sample() function reads pg_stat_activity. For each query, it attempts and explain (format xml), removes Filter and Index conditions with a regexp_replace and calculate a md5 of it, which identifies the execution plan.

Then, if it is a new one, it will be inserted, with the pg_stat_activity information from this sample. If it already exists, then it will compare the running time. If higher, the pg_stat_activity will be updated so that we record a query with a long running time. If not, it will only increases the call count. A high count means that either a query takes long (active during many samples) or is short but frequent enough to be catched by the sampling. This is how I'll identify the statements for which an improvement can have a significant impact on the system.

Sampling loop

To collect the statements, I run the sampling in a loop. Every one second can take some resources. 10 seconds may be a safer frequency if running longer.

run sampling from psql \watch

select yb_ash_sample();
\watch 1
Enter fullscreen mode Exit fullscreen mode

or with a loop:

create or replace procedure yb_ash_loop(seconds int default 1)
 as $$
  declare n int; 
  begin loop select yb_ash_sample(), pg_sleep(seconds) into n; 
  raise notice '% Updated statements: %',clock_timestamp(),n;
  commit;
  end loop; end;
  $$ language plpgsql;

call yb_ash_loop(1);

Enter fullscreen mode Exit fullscreen mode

In YugabyteDB it catches the activity on the node where it is connected to. This may be sufficient when the load is distributed by the smart driver or a reverse proxy. But it is important to keep it in mind.

Check long queries

The yb_ash_plans stores the plans, in the xml format, but the most important is the query which you can run and explain. Tuning will probably start with the ones with the more samples.

select samples,running_for,last_pg_stat_activity->>'query' query
 from yb_ash_plans 
order by samples desc limit 10;
Enter fullscreen mode Exit fullscreen mode

Here is a small example:

yugabyte=> select samples,running_for,last_pg_stat_activity->>'query' query
 from yb_ash_plans
order by samples desc limit 10;
 samples |   running_for   |                                   query
---------+-----------------+----------------------------------------------------------------------------
      54 | 00:00:59.299518 | select pg_sleep(60);
      20 | 00:00:00.066935 | select count(*) from pgbench_history ;
       1 | 00:00:00.087005 | UPDATE pgbench_accounts SET abalance = abalance + -2743 WHERE aid = 42216;
       1 | 00:00:00.014626 | SELECT abalance FROM pgbench_accounts WHERE aid = 23344;
       1 | 00:00:00.01092  | UPDATE pgbench_branches SET bbalance = bbalance + -4999 WHERE bid = 1;
(5 rows)
Enter fullscreen mode Exit fullscreen mode

As a very general approximation, if you divide the the number of samples by the number of sampling, you have the average active sessions running this statement. Here, pg_sleep(60) has been run once during the 1 second sampling, so we have approximately 60 samples. The "running for" is the longest running time that has been seen and gives an idea to differentiate long running queries (like this pg_sleep(60) which has been sampled after 59.29 seconds of run time) from frequently run short queries (like this select count(*) that has been seen 20 times but never on runs longer than 0.06 seconds).

I have left the XML plan as text so that we can extract some information from it. For example, you can try this:

with top_plans as (
select samples,running_for,last_pg_stat_activity->>'query' query,last_plan
 from yb_ash_plans 
order by samples desc limit 5
), top_plan_elements as (
select 
 samples,running_for
,json_build_object(
'tables',json_agg(distinct regexp_replace(plan_relation,'.*<Relation-Name>(.*)</Relation-Name>.*','\1')),
'aliases',json_agg(distinct regexp_replace(plan_alias,'.*<Alias>(.*)</Alias>.*','\1')),
'nodes',json_agg(distinct regexp_replace(plan_node,'.*<Node-Type>(.*)</Node-Type>.*','\1')),
'conditions',json_agg(distinct regexp_replace(plan_cond,'.*<.*-Cond>(.*)</.*-Cond>.*','\1')),
'filters',json_agg(distinct regexp_replace(plan_filter,'.*<Filter>(.*)</Filter>.*','\1'))
) plan_elements
,query
from top_plans
,lateral (select * from (select regexp_split_to_table(last_plan,'^ *','n') plan_relation ) plan_relation where plan_relation like '<Relation-Name>%'  ) plan_relations
,lateral (select * from (select regexp_split_to_table(last_plan,'^ *','n') plan_alias ) plan_alias where plan_alias like '<Alias>%'  ) plan_aliases
,lateral (select * from (select regexp_split_to_table(last_plan,'^ *','n') plan_node ) plan_nodes where plan_node like '<Node-Type>%'  ) plan_nodes
,lateral (select * from (select regexp_split_to_table(last_plan,'^ *','n') plan_filter ) plan_filter where plan_filter like '<Filter>%'  ) plan_filters
,lateral (select * from (select regexp_split_to_table(last_plan,'^ *','n') plan_cond ) plan_cond where plan_cond like '<%-Cond>%'  ) plan_conds
group by samples,running_for,query,last_plan
) select * 
from top_plan_elements
;
Enter fullscreen mode Exit fullscreen mode

Try it during your stress tests and it should show the most active queries, which are those where an optimization can have a significant impact on the system. Any comments welcome.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .