❔ About
So you're both a - maybe early - PostgreSQL fan... and a recent duckdb
adopter.
You like both databases for their strengths and ecosystems... and wonder how it would be
possible to seamlessly send data from/to each other databases... without having to code anything, I mean nothing more that playing
sql shell
commands in a terminal
: no Python, no Java,...
👉 Well this is exactly what I will talk about in this post thanks to :
🤔 ... but why this post ?
The 3 main reasons of this article and why to pull/put from PostgreSQL
/duckdb
at this point are :
DuckDB |
PostgreSQL |
|
---|---|---|
Database Model | Columnar database | Relational database |
License | MIT | BSD like |
Serverless | Yes | No |
🎯 What we'll do
We will, only from terminal :
- 🐋 Install & boot a containerized PostgreSQL database (with Podman)
- 🐘 Create a database
- 🔁 Create and feed a little table
-
🪄 Read the
psql
table fromduckdb
-
🗜️ Export the
psql
table to a parquet file -
🔬 Inspect
parquet
file withparquet-cli
Also we will do the reverse move :
- Create a table in PostgreSQL from within
duckdb
- Test table contents from
sql
🍿 Demo
📜 shell
scripts
Install & boot a PostgreSQL instance:
export PGPASSWORD=mysecretpassword
# Boot a postgresql instance
podman run --name postgres -e POSTGRES_PASSWORD=$PGPASSWORD -d\
-p 5432:5432 docker.io/library/postgres
# Check container status
podman ps -a
Install psql
so we can reach PostgrSQL from outside the contenair:
# Install `psql` on the host so the database can be accessed
# from outside de container
sudo apt install -y postgresql-client
Now, create some PostgreSQL objects:
# Create a demo database
psql -h localhost -p 5432 -U postgres -c "CREATE DATABASE demo;"
# Create a table
psql -h localhost -p 5432 -U postgres -d demo\
-c "CREATE table customers(id varchar primary key);"
# Feed the PostgreSQL table with some data
psql -h localhost -p 5432 -U postgres -d demo\
-c "insert into customers values \
('Duffy duck'),\
('Daisy Duck'),\
('Donald Duck'),\
('Ludwig Von Drake');"
Install duckdb
:
# (Quick and dirty) duckdb install
wget https://github.com/duckdb/duckdb/releases/download/v1.0.0/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
cp duckdb /usr/bin/
rm duckdb duckdb_cli-linux-amd64.zip
Now do the fun stuff...
Reach PostgreSQL database from duckdb
Let's reach postgres database from duckdb
:
duckdb -c "ATTACH 'dbname=demo user=postgres password=mysecretpassword host=127.0.0.1'\
AS db (TYPE POSTGRES, READ_ONLY);
show all tables;
select * from db.customers;
COPY db.customers TO 'db.customers.parquet' (FORMAT PARQUET);"
... then check the output parquet
file:
ls -ltr
file db.customers.parquet
... and read the resulting parquet
file from duckdb
:
duckdb -c "select * from 'db.customers.parquet';"
Test resulting parquet
file with parquet-cli
:
pip install parquet-cli
parq -h
parq db.customers.parquet --count
parq db.customers.parquet --head
parq db.customers.parquet --tail
duckdb
➡️ PostgreSQL
Let's:
-
"Attach" the remote PostgreSQL instance from
duckdb
runtime - Create a table
- Feed the table
-
Select table contents from
psql
duckdb -c "ATTACH 'dbname=demo user=postgres password=mysecretpassword host=127.0.0.1'</span>
AS db (TYPE POSTGRES);</span>
create table db.heroes(name varchar primary key);</span>
insert into db.heroes values</span>
('Dumbo'),</span>
('Man-Elephant'),</span>
('Tantra'),</span>
('Elephant Man'),</span>
('The Elephantmen'),</span>
('Mammomax') ;
"
psql -h localhost -p 5432 -U postgres -d demo</span>
-c "select * from heroes;"
⚖️ More about DuckDB
vs PostgreSQL
See below this very synthetic breakdown from influxdata: