You have a production YugabyteDB and you need to create multiple developer databases. To achieve this, you may have to anonymize certain data and prepare a database that the developers can use. Furthermore, the developers want to run it in their own Docker container. The question is, how can you load this database into new containers?
One solution is to use ysql_dump
to export the database and import it into an empty developer container. This may take several minutes for large databases in YugabyteDB if you have thousand of tables, indexes and referential integrity constraints.
To achieve a faster solution, you can copy the physical files. Doing this in a production cluster can be complex because it requires running snapshots on all yb-tserver nodes and exporting the metadata from yb-master. In development, if you have a single node that is started with yugabyted
. The process is much simpler. All data and metadata are contained in the --base_dir
, which by default is set to /root/var
.
There are two solutions: build a Docker image containing the data directory or use an external volume.
Docker image containing the database files
To create a Docker image, the following Dockerfile starts a yugabyte
instance, initializes the database from the scripts in an initial directory, and copies the data directory to a new image.
Here is my Dockerfile
:
FROM yugabytedb/yugabyte as init
RUN mkdir /initial_scripts_dir
ADD . /initial_scripts_dir
RUN bin/yugabyted start --advertise_address=0.0.0.0 \
--background=true --base_dir=/root/var \
--initial_scripts_dir=/initial_scripts_dir
RUN yugabyted stop
# copy to a new image without the /initial_scripts_dir
FROM yugabytedb/yugabyte as base
COPY --from=init /root/var /root/var
WORKDIR /root/var/logs
CMD yugabyted start --background=false
I have a file called demo.dump
that is generated by ysql_dump
. I create demo.sql
to run and create a database. When yugabyted
starts with a --initial_scripts_dir
it runs all .sql
and .cql
files. I use another extension for the dump, as I call it from my .sql
file, and leave it untouched.
Here is my demo.sql
:
create database demo;
\set ON_ERROR_STOP on
\c demo
\ir demo.dump
This is run when the image build starts yugabyted
.
Using this Dockerfile and the .sql
scripts in the current directory, I build the yb-dev
image:
docker build -t yb-dev .
To utilize this image, the developer can simply create a container:
docker run -d --name yb-dev1 -p5433:5433 -p 15433:15433 yb-dev
psql -h localhost -p 5433 -U yugabyte -d demo
By default, Docker uses the overlay2 storage driver that copies entire files to the container layer when they are written to. The good news is that, for YugabyteDB, this is not a major issue, as the largest files are immutable SST files that do not change. Any new data goes to new files, WAL or SST, meaning existing files are not modified.
External volume for the database directory
If your developers prefer to use a standard image containing only the binaries and store the database in an external volume, you can extract a tarball with the initial database base directory:
docker run -i yb-dev tar -zcvf - /root/var > demo.tgz
To use it, developers can extract it and start the YugabyteDB container by specifying the volume:
tar --sparse -zxvf demo.tgz
docker run -d --name yb-dev2 -p5433:5433 -p15433:15433 \
-v $(pwd)/root/var:/root/var \
yugabytedb/yugabyte \
yugabyted start --advertise_address=0.0.0.0 --background=false
psql -h localhost -p 5433 -U yugabyte -d demo
yugabyted
This works only with single-container clusters for development, starting with yugabyted
. To copy the physical files from a multi-node cluster, you need to take distributed snapshots to get all data and metadata consistent.
You might be wondering why it's necessary to copy the entire base directory /root/var
instead of /root/var/data
. The reason is that the UUID of the universe, the yb-master, and the yb-tserver are located in /root/var/conf
. When opening an existing database, these values must match with the data directory, so it's essential to copy the entire base directory to ensure that everything matches up correctly.
When shipping the database to a developer, it can be done as a Docker image or an external volume to mount. The container starts up immediately since it opens an already existing database.