Process Watch displays the instruction mix of the processes running on your Linux system. It recently added support for the Arm architecture, and works well on AWS Graviton processors. It can be used to quickly identify application usage of floating point, NEON, SVE, and SVE2 vector instructions.
After analyzing a running system or application with Process Watch you may be able to increase performance by recompiling applications, updating libraries, or generally finding ways to improve performance using vector instructions.
Read on to find out how to install and use Process Watch.
What software is required to build Process Watch?
The Process Watch source code is available on GitHub.
You will need various tools to build Process Watch from source:
- CMake
- Clang
- LLVM
- libelf
The instructions below are for an AWS Graviton-based EC2 instance running Ubuntu 24.04. You can modify them as needed for your Linux distribution.
To install the required tools run:
sudo apt-get update
sudo apt-get install libelf-dev cmake clang llvm llvm-dev python-is-python3 -y
Where is the Process Watch source code?
Use git
to clone the Process Watch repository:
git clone --recursive https://github.com/intel/processwatch.git
Make sure to include the --recursive
option to clone all submodules.
If you are curious, the submodules are:
Change to the repository directory:
cd processwatch
What is the best way to build Process Watch?
To build Process Watch, run the build script included in the repository:
./build.sh -b
You will see the following output:
Compiling dependencies...
No system bpftool found! Compiling libbpf and bpftool...
Compiling capstone...
Building the 'insn' BPF program:
Gathering BTF information for this kernel...
Compiling the BPF program...
Stripping the object file...
Generating the BPF skeleton header...
Linking the main Process Watch binary...
You now have the processwatch
binary in the top-level directory of the repository.
For convenience, copy it to a common place included in your search path:
sudo cp ./processwatch /usr/local/bin
Do I need to run Process Watch as root?
You can run Process Watch as a non-root user, but it requires the modifications shown below, which decreases system security.
To enable non-root users to run Process Watch, you need to run the 3 commands below. If you want to run Process Watch with sudo
you can skip these 3 commands and go to the next section.
sudo setcap CAP_PERFMON,CAP_BPF=+ep /usr/local/bin/processwatch
sudo sysctl -w kernel.perf_event_paranoid=-1
sudo sysctl kernel.unprivileged_bpf_disabled=0
How do I run Process Watch?
Process Watch accepts a number of command-line arguments. You can view these by running:
sudo processwatch -h
The output is:
usage: processwatch [options]
options:
-h Displays this help message.
-v Displays the version.
-i <int> Prints results every <int> seconds.
-n <num> Prints results for <num> intervals.
-c Prints all results in CSV format to stdout.
-p <pid> Only profiles <pid>.
-m Displays instruction mnemonics, instead of categories.
-s <samp> Profiles instructions with a sampling period of <samp>.
-f <filter> Can be used multiple times. Defines filters for columns. Defaults to 'FPARMv8', 'NEON', 'SVE' and 'SVE2'.
-l Prints all available categories, or mnemonics if -m is specified.
-d Prints only debug information.
Without any options Process Watch:
- Prints results every two seconds
- Prints results until it is killed (using Ctrl+C)
- Prints all results in a table format on
stdout
- Profiles all running processes
- Displays counts for the default filters, which are 'FPARMv8', 'NEON', 'SVE', and 'SVE2'
- Sets the sample period to every 10000 events
What does the Process Watch output look like?
You can run Process Watch with no arguments:
sudo ./processwatch
The output is similar to:
PID NAME FPARMv8 NEON SVE SVE2 %TOTAL TOTAL
ALL ALL 0.00 0.29 0.00 0.00 100.00 346
17400 processwatch 0.00 0.36 0.00 0.00 80.64 279
254 systemd-journal 0.00 0.00 0.00 0.00 13.01 45
542 irqbalance 0.00 0.00 0.00 0.00 2.60 09
544 rs:main Q:Reg 0.00 0.00 0.00 0.00 2.02 07
560 snapd 0.00 0.00 0.00 0.00 1.16 04
296 multipathd 0.00 0.00 0.00 0.00 0.58 02
PID NAME FPARMv8 NEON SVE SVE2 %TOTAL TOTAL
ALL ALL 3.57 12.86 0.00 0.00 100.00 140
17400 processwatch 3.73 13.43 0.00 0.00 95.71 134
4939 sshd 0.00 0.00 0.00 0.00 2.86 04
296 multipathd 0.00 0.00 0.00 0.00 0.71 01
560 snapd 0.00 0.00 0.00 0.00 0.71 01
PID NAME FPARMv8 NEON SVE SVE2 %TOTAL TOTAL
ALL ALL 1.18 5.12 0.00 0.00 100.00 254
17400 processwatch 1.19 5.16 0.00 0.00 99.21 252
6651 packagekitd 0.00 0.00 0.00 0.00 0.39 01
4939 sshd 0.00 0.00 0.00 0.00 0.39 01
New output comes every two seconds, and the next samples are appended to the bottom of the output.
Use Ctrl+C to end processwatch
.
How can I use Process Watch to identify applications which could run faster?
To see an example of how Process Watch can be used, use a text editor to save the Python code below into a file named zip.py
.
import gzip
size = 16384
for _ in range(3):
with open('largefile', 'rb') as f_in:
with gzip.open('largefile.gz', 'wb') as f_out:
while (data := f_in.read(size)):
f_out.write(data)
f_out.close()
print("Zip complete")
The Python code reads a file named largefile
and writes a compressed version as largefile.gz
.
To create the input file, use the dd
command:
dd if=/dev/zero of=largefile count=1M bs=1024
Next, use a text editor to save the script below in a file named run1.sh
.
#!/bin/bash
python ./zip.py &
pid=$!
sudo processwatch -p $pid -s 1 -i 2 -f HasCRC -f HasNEON &
pid2=$!
sleep 30
kill $pid2
Look at the script and see that it starts the Python code to compress largefile
and then attaches processwatch
to monitor for CRC and NEON instructions. The sample rate is every 1 second and the output is printed every 2 seconds.
File compression is a task that runs best when CRC instructions are used. No CRC instructions indicates an opening for performance improvement.
Run the script using:
bash ./run1.sh
The Process Watch output will start to print on stdout
.
The output is similar to:
PID NAME CRC NEON %TOTAL TOTAL
ALL ALL 0.00 1.63 100.00 25466
1226 python 0.00 1.63 100.00 25466
PID NAME CRC NEON %TOTAL TOTAL
ALL ALL 0.00 1.96 100.00 23224
1226 python 0.00 1.96 100.00 23224
Notice that no CRC instructions are shown. This is because the version of zlib
supplied by the Linux distribution is not compiled with CRC instructions.
You can confirm this by running objdump
to disassemble the library and look for crc32 instructions.
objdump -d /usr/lib/aarch64-linux-gnu/libz.so.1 | awk -F" " '{print $3}' | grep crc32 | wc -l
If the result is 0 then there are no crc32 instructions used in the library.
Is there a way to use CRC instructions for file compression?
If there are no crc32 instructions in zlib then you can use zlib-cloudflare to increase application performance.
To build and install zlib-cloudflare download and build using:
git clone https://github.com/cloudflare/zlib.git
cd zlib && ./configure
make && sudo make install
cd ..
With the new library installed, use a text editor to save a new bash script with the code below to the file run2.sh
:
#!/bin/bash
LD_PRELOAD=/usr/local/lib/libz.so python ./zip.py &
pid=$!
sudo processwatch -p $pid -s 1 -i 2 -f HasCRC -f HasNEON &
pid2=$!
sleep 10
kill $pid2
Notice the new libz.so
is used and the CRC and NEON instructions are again printed every 2 seconds.
Run the new script using:
bash ./run2.sh
This time the CRC instructions are used, and the performance is significantly faster.
The output is similar to:
PID NAME CRC NEON %TOTAL TOTAL
ALL ALL 17.40 1.80 100.00 25246
1251 python 17.40 1.80 100.00 25246
PID NAME CRC NEON %TOTAL TOTAL
ALL ALL 17.33 2.54 100.00 23556
1251 python 17.33 2.54 100.00 23556
What else can I do with Process Watch?
Besides CRC instructions, you can use Process Watch to look for NEON, SVE, and SVE2 instructions. These are all vector extensions in Arm processors used to increase application performance.
To see examples of NEON and SVE refer to the Using Process watch Arm Learning Path.