In the previous post, I installed a PostgreSQL extension to YugabyteDB by building it. Some extensions are available in the PostgreSQL Global Development Group YUM repository. Here is how to get it from there into a YugabyteDB.
I'm using YugabyteDB 2.17, which is compatible with PostgreSQL 11.2, on Centos7. Here are the available .rpm
:
https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/
For Centos8, pleas look at the next post:
Install extensions from PGDG repo to YugabyteDB - example with sequential_uuids
Franck Pachot for YugabyteDB Distributed PostgreSQL Database ・ Dec 22 '22
Example with sequential_uuids
Vlad Mihalcea asked if we have a Time-Sorted Unique Identifier in YugabyteDB. I answered that we probably don't need it but that there is probably a PostgreSQL extension that works with YugabyteDB:
https://twitter.com/FranckPachot/status/1600743186326245376
And, yes, there is, available from the YUM repository. I'll take this as an example.
I do all that in a temporary directory to avoid overwriting existing files
Download the RPM
I can download the .rpm
from the online repo:
yum install -y wget
wget -c https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/sequential_uuids_11-1.0.2-1.rhel7.x86_64.rpm
or though YUM after installing the repo:
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install --downloadonly -y --downloaddir=$PWD sequential_uuids_11
Dependencies
If downloaded directly, I can check the dependencies with rpm -qRp
:
If downloaded with yum, the dependencies have been downloaded. I can also I can get them with yum -q deplist
and they were displayed when downloading:
Note that I don't need the postgresql11
ones because I'll use YugabyteDB instead.
Extract the extension to YugabyteDB
My YugabyteDB installation dir is /home/yugabyte
and the extension files (share/extensions/*.{control,sql}
and libs/*.so
) must go into /home/yugabyte/postgres
.
Because the RPM installs it in /usr/pgsql-11
, I create a symbolic link in my temporary directory and them with a relative path:
mkdir -p ./usr &&
ln -Ts /home/yugabyte/postgres ./usr/pgsql-11 &&
rpm2cpio ./sequential_uuids_11-*.rpm |
cpio -idmv --no-absolute-filenames
Using -v
, it displays the extracted files:
Thanks to the symbolic link, they are in the YugabyteDB postgres directory.
The temporary directory can be removed.
Testing the extension
You must install the extension on all nodes of your YugabyteDB server.
Then, when connected to any node, load the extension and play with it:
yugabyte=# create extension sequential_uuids;
CREATE EXTENSION
yugabyte=# create table demo ( id uuid, primary key(id asc) );
yugabyte=# insert into demo
select uuid_time_nextval(5,x'ffff'::int)
from generate_series(1,5)
returning *;
yugabyte=# \watch
Thu 22 Dec 2022 08:24:15 AM UTC (every 2s)
id
--------------------------------------
9da30789-478e-4abd-857e-92c60dfbd375
9da3dd77-0f14-46ae-90ce-5ffb40531bae
9da34a5d-f400-4496-a0f4-7db59c8790cb
9da3f9c6-acb9-4f10-a9e8-498767290ca6
9da35df5-1698-4371-88cb-077d10c678db
(5 rows)
Thu 22 Dec 2022 08:24:17 AM UTC (every 2s)
id
--------------------------------------
9da3b4a3-448c-4f5f-b70c-d420175fc335
9da39dac-494a-458a-a797-61d6c30a53ca
9da30ae4-ecbe-4ab6-a795-3c56b1fe46ba
9da3e66a-510e-45bd-a8b7-f1afaa84865c
9da32dca-4a4d-48fb-ba20-b4c2f367fed2
(5 rows)
Thu 22 Dec 2022 08:24:19 AM UTC (every 2s)
id
--------------------------------------
9da31ec6-78aa-493e-b0a9-382a5d92a54c
9da3b406-1968-4b43-8fd7-a575e051d4f7
9da34ea9-b6cd-4d17-8c88-f080ecf89cdd
9da3f783-8cff-42bc-9ee9-98d264780718
9da35c39-0c86-42fa-a33f-e66e47a11068
(5 rows)
Thu 22 Dec 2022 08:24:21 AM UTC (every 2s)
id
--------------------------------------
9da4f46a-e427-45f2-a313-42f0fba948a2
9da48fdd-fa08-42ac-9e5f-3413a4f5317d
9da494bc-ec6b-493b-ac8f-d2f83a8f7be3
9da4561f-568e-4713-b87e-2af989377c09
9da477ac-02f8-46d9-b84b-3a8f54b31d98
(5 rows)
Thu 22 Dec 2022 08:24:23 AM UTC (every 2s)
You see that the first 4 bytes (because 0xffff
, the default, and the maximum) are the same within a 5 second interval (the first parameter which I've set to 5 instead of the default 60 seconds) with the others random.
Note that I have defined the primary key sharding as ASC rather than the default HASH because, if the goal is to colocate the rows inserted at the same time, you don't want to apply hash sharding.
I also verify that from concurrent sessions the prefix is the same:
for i in {1..10}
do
ysqlsh -tAq -h $(hostname) -c 'insert into demo select uuid_time_nextval() returning *,now()'&
done
Validating the compatibility with PostgreSQL
The best way to compare the behavior in YugabyteDB is to run the PostgreSQL regression test and compare with the expected output:
yum install -y git
git clone https://github.com/tvondra/sequential-uuids.git
ysqlsh -h $(hostname) --echo-all --quiet \
--file sequential-uuids/test/sql/uuids.sql |
sdiff sequential-uuids/test/expected/uuids.out -
The only difference is because the regression test has no ORDER BY and the order of rows from a hash partitioned table (the default) in a distributed database is different than from the single-node heap table:
But the result is the same.
The full PostgreSQL-compatibility of the extension is confirmed by the regression tests.
About Sequential UUID in YugabyteDB
⚠️ You must think about the consequence in a Distributed SQL database before using a time-based UUID. Thanks to the LSM-Tree storage, YugabyteDB doesn't have the problems that sequential_uuids
tries to solve (WAL write amplification, B-Tree fragmentation and clustering factor). If you want a UUID, then the one from pgcrypto
(already installed in YugabyteDB) gen_random_uuid()
is probably the right one.
In addition to that, having concurrent sessions touching the same key range will create a hotspot on one tablet. However, if there are not too many concurrent sessions, and the load is distributed with other tables, then maybe this UUID can benefit from colocation in DocDB and filesystem caches. But a cached sequence may be better as it keeps rows together from the same session, but distributes those from concurrent sessions.
Anyway, I've run the above for a while, with more rows inserted (from generate_series(1,100000)
) and checked that tablet auto-split occurs:
As you see, one is still at Split Depth 1 because no Sequential UUID went into this range, and another has been split 7 times. So YugabyteDB still balanced the storage.
Summary on the extension installation
If you think a PostgreSQL extension is necessary for your application, you can use this example to test it. The same can be used to deploy it. But to be sure that it is supported, do not hesitate to open a git issue of ask in the slack channel.