I got a question about this "benchmark" results which shows the following funny results:
If this was true, it would be a great win for all Distributed SQL Databases tested there: response time in microseconds 😂. Well, I guess this one is a typo.
For the fun of it, I'll run their benchmark with YugabyteDB.
Single-node cluster
I start a single-node YugabyteDB cluster:
docker network create yb
docker run -d --name yb-n1 --hostname n1 --network yb -p15433:15433 yugabytedb/yugabyte yugabyted start --advertise_address=n1.yb --background=false
I exposed the port 15433 which is the YugabyteDB console:
I run this "hugedbbench" (from the last commit that works, because it seems broken now) and I display the queries response time from pg_stat_statements
:
docker exec -it yb-n1 bash -c '
# get the project
dnf install -y git golang
git clone https://github.com/kokizzu/hugedbbench.git
cd hugedbbench/2021/yugabytedb
git checkout 0fdc96905d319751a79cc386f78f79e64ab22d43
# reset pg_stat_statement
ysqlsh -h n1.yb -c "
drop table if exists bar1
" -c "
select pg_stat_statements_reset()
"
# run the "benchmark"
sed -e "s/127.0.0.1/n1.yb/" -i main_test.go
go test
# show execution statistics
ysqlsh -h n1.yb -c "
select mean_time, min_time, max_time, query, calls, rows
from pg_stat_statements order by 1
" -c "
select pg_size_pretty(pg_table_size(tablename::regclass)) from pg_tables where tableowner=user
"
'
It runs for a few minutes, displaying the total time:
Here is the result from pg_stat_statements
:
mean_time | min_time | max_time | query | calls | rows
------------------+------------+------------+--------------------------------------------------------------------------------+--------+--------
0.231428 | 0.231428 | 0.231428 | select pg_stat_statements_reset() | 1 | 1
1.82675776968002 | 0.279921 | 71.312284 | SELECT foo FROM bar1 WHERE id=$1 | 100000 | 100000
4.66872027791997 | 0.690059 | 216.301717 | INSERT INTO bar1(id,foo) VALUES($1,$2) | 100000 | 100000
5.46740847204005 | 1.285634 | 424.903964 | UPDATE bar1 SET foo=$1 WHERE id=$2 | 100000 | 100000
263.1718355 | 122.983744 | 403.359927 | SELECT COUNT($1) FROM bar1 | 2 | 2
830.000174 | 830.000174 | 830.000174 | TRUNCATE TABLE bar1 | 1 | 0
1413.53706 | 1413.53706 | 1413.53706 | CREATE TABLE IF NOT EXISTS bar1(id BIGINT PRIMARY KEY, foo VARCHAR(10) UNIQUE) | 1 | 0
(7 rows)
pg_size_pretty
----------------
132 MB
(1 row)
The one-row DML are in single-digit millisecond. That's what is expected. You can run the same with other databases, but whatever the result, those doesn't really make sense.
The query stats are also visible in the console:
Replication Factor 3 cluster
You can add more nodes to the YugabyteDB cluster:
docker run -d --name yb-n2 --hostname n2 --network yb yugabytedb/yugabyte yugabyted start --join=yb-n1 --advertise_address=n2.yb--background=false
docker run -d --name yb-n3 --hostname n3 --network yb yugabytedb/yugabyte yugabyted start --join=yb-n1 --advertise_address=n3.yb--background=false
The data protection has switched to Replication Factor 3:
However this benchmark is not relevant for such configuration because it is single-threaded. When you distribute the database, you expect to distribute the load. And if you do that for High Availability only, then you want the nodes on different zones. That's why I stop there. There are much more significant workloads to run if you want to compare databases.
All databases have also some optimizations that may not be enabled by default (for different reasons, backward compatibility, and rolling upgrades are some of them). If you want to run it in an optimized way on YugabyteDB, I suggest to enable Packed Rows and Colocation