During an application release, you may run some DDL and DML to bring the database from one version to another. If anything fails, for whatever reason, you don't want to leave it in an intermediate state and restore the initial state.
- This is an In-Place Point In Time Recovery (PITR) and can be achieved by restoring a backup and recovering the transactions until the initial point in time. But doing this in this way can be long, and the fallback procedure is part of the maintenance window, whatever the probability of the decision.
- Another solution is Transactional DDL where you run everything in a transaction that can be rolled back. PostgreSQL is quite nice there, supporting transactional DDL and being fast to rollback. But having long-running transactions may not be the best idea and you don't really need this because the application is stopped.
- The third solution is what Oracle calls Flashback Database, and Amazon Aurora calls Backtrack. It is similar to PITR but, rather than a restore + recover, it keeps a snapshot from the start of the maintenance window, easy to revert to.
The snapshot solution depends on the storage. YugabyteDB stores tables and indexes into LSM-Tree which are append-only. This is stored on the filesystem with SST Files, which are written once and never updated. With this technology, the snapshots are easy: nothing to write, just keep the files. It is a bit more complicated to synchronize metadata (the PostgreSQL catalog) that is stored on the yb-master. To automate this, we can create a snapshot schedule for the duration of the maintenance.
Example
I'm starting a RF=3 cluster on my laptop using my ybdemo/docker/yb-lab/ and I set two aliases, ysqlsh
, for SQL commands, and yb-admin
, for snapshot commands:
alias ysqlsh="\
docker exec -it yb-tserver-0 ysqlsh -h yb-tserver-0 \
"
alias yb-admin="\
docker exec -it yb-master-0 /home/yugabyte/bin/yb-admin \
--master_addresses $(echo yb-master-{0..2}:7100|tr ' ' ,)\
"
sh gen-yb-docker-compose.sh rf3
I create a database with a demo table:
ysqlsh
create database myapp;
\c myapp
create table mytable (a uuid primary key, b bigint unique);
create extension pgcrypto;
insert into mytable select gen_random_uuid(), generate_series(1,1000);
Here are my tables and indexes: mytable
with OID 16426 holds the table rows in its primary key structure, indexed on "a" and mytable_b_key
is the unique index on "b"
Take a snapshot
When the maintenance window starts, after stopping the application and before running the changes in the database, I take a snapshot by starting a snapshot schedule with yb-admin create_snapshot_schedule
, a frequency and retention, and the name of the database prefixed with ysql.
:
Franck@YB:/home/ybdemo/docker/yb-lab $
yb_sched=$(
yb-admin create_snapshot_schedule 60 2880 ysql.myapp |
tee /dev/stderr |
jq -r .schedule_id ;
) ; echo "${yb_sched}"
{
"schedule_id": "320000d2-8424-4930-aeb2-5994d1d5a345"
}
320000d2-8424-4930-aeb2-5994d1d5a345
This will take an hourly snapshot (every 60 minutes) that is kept for two days (2880 minutes). This allows to recover within 24h, without increasing the MVCC retention too much (one hour here). Here, the goal is to cover the maintenance window and be able to recover quickly by applying, at most, one hour of WAL. I need only one snapshot fort that, and I'll delete the snapshot schedule at the end.
The first snapshot is taken in the background. I want to be sure that the snapshot is complete:
Franck@YB:/home/ybdemo/docker/yb-lab $
while yb-admin list_snapshots | grep CREATING
do sleep 1 ; done
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin list_snapshots
yb-admin list_snapshots
Snapshot UUID State Creation Time
c9290df0-dc3c-4819-88d8-8f32372497d1 COMPLETE 2022-06-14 16:35:50.661273
No snapshot restorations
This enables Point In Time Recovery (PITR) and allows to recover to any point in time after this first snapshot. As the goal is to revert back to the start of the maintenance window, I take note of this timestamp:
Franck@YB:/home/ybdemo/docker/yb-lab $
yb_pit0=$(
ysqlsh -qAt -c "
select to_char(now() at time zone 'utc','YYYY-MM-DD HH24:MI:SS')
" myapp
) ; echo "${yb_pit0}"
2022-06-14 16:36:03
I get it from the database. You can also get it from Linux date +%s
but I find it convenient to log it at the start of the DDL + DML that will be run.
Some DDL and DML
I this example I'll do two things:
- remove half of the rows, as if we purge old data
- change the primary key from column
a
to columnb
which is typically something that requires DDL + DML, though an intermediate table
ysqlsh myapp
begin transaction;
delete from mytable where b<=500;
commit;
alter table mytable rename to tmp_mytable;
create table mytable (a uuid unique, b bigint primary key);
insert into mytable (a,b) select a,b from tmp_mytable;
drop table tmp_mytable;
\q
Now that PITR is enabled, I still see the dropped table and index, because metadata is required in case of restoring a previous state of data. They are still there, but marked as "Hidden":
The new table, with "b" as the primary key, is the visible one:
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c '\d mytable' myapp;
Table "public.mytable"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+---------
a | uuid | | |
b | bigint | | not null |
Indexes:
"mytable_pkey" PRIMARY KEY, lsm (b HASH)
"mytable_a_key" UNIQUE CONSTRAINT, lsm (a HASH)
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c 'select count(*) from mytable' myapp
count
---------
500
(1 row)
Flashback
Let's say that this change doesn't work as expected, the application test shows that something fails with this new schema. We want to revert it back quickly and open the application again.
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin restore_snapshot_schedule ${yb_sched} "${yb_pit0}"
{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "727a168b-6666-42a1-b67f-4f5148180f88"
}
The restore happens in the background. I want to be sure that the restore is complete:
Franck@YB:/home/ybdemo/docker/yb-lab $
while yb-admin list_snapshots | grep RESTORING
do sleep 1 ; done
yb-admin list_snapshots
Snapshot UUID State Creation Time
c9290df0-dc3c-4819-88d8-8f32372497d1 COMPLETE 2022-06-14 16:35:50.661273
6c223f6c-b033-41dc-a1d9-404607b79252 COMPLETE 2022-06-14 16:37:31.195093
Restoration UUID State
727a168b-6666-42a1-b67f-4f5148180f88 RESTORED
Verification
Now, at this point in time, the new tables are hidden and the old ones are visible:
Both metadata and data have been restored as of the initial point in time:
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c '\d mytable' myapp;
Table "public.mytable"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+---------
a | uuid | | not null |
b | bigint | | |
Indexes:
"mytable_pkey" PRIMARY KEY, lsm (a HASH)
"mytable_b_key" UNIQUE CONSTRAINT, lsm (b HASH)
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c 'select count(*) from mytable' myapp
count
---------
1000
(1 row)
All is back as of 16:36:03 - tables and metadata
Forward and backward
In the list_snapshot
above we have seen two snapshots because one was taken before the restore. This means that, in case of doubt, we can roll forward and backward within this timeline:
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:37:31"
{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "ab81a8f8-eee1-410a-9e06-65edb6248160"
}
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c 'select count(*) from mytable' myapp
count
-------
500
(1 row)
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:37:00"
{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "edc3981d-fc77-4564-b77a-05d2e05d5483"
}
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c 'select count(*) from mytable' myapp
count
-------
500
(1 row)
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:36:30"
{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "3c51073f-2d43-418c-b468-29b3c2fee160"
}
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c 'select count(*) from mytable' myapp
count
-------
1000
(1 row)
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:36:45"
{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "d2e137f8-86fa-41bf-ace8-864bf1572fc3"
}
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c 'select count(*) from mytable' myapp
ERROR: relation "mytable" does not exist
LINE 1: select count(*) from mytable
^
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin restore_snapshot_schedule ${yb_sched} "${yb_pit0}"
{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "4ef52933-a40d-4764-8dfd-f01f6298b966"
}
Franck@YB:/home/ybdemo/docker/yb-lab $
ysqlsh -c 'select count(*) from mytable' myapp
count
-------
1000
(1 row)
You see here that I was able to go back to the future, just before the restore, then back in time again, finding when the table had 1000 rows before my delete, and even when the table was renamed but not yet created, and finally back to the initial state as it was my goal.
Delete schedule
When all is validated, there's no need to keep the snapshots. Of course you can, if you want to be able to do point in time recovery, but for this example I need it only to cover the maintenance window.
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin delete_snapshot_schedule ${yb_sched}
{
"schedule_id": "320000d2-8424-4930-aeb2-5994d1d5a345"
}
Franck@YB:/home/ybdemo/docker/yb-lab $
yb-admin list_snapshots
No snapshots
Restoration UUID State
727a168b-6666-42a1-b67f-4f5148180f88 RESTORED
ab81a8f8-eee1-410a-9e06-65edb6248160 RESTORED
4ef52933-a40d-4764-8dfd-f01f6298b966 RESTORED
edc3981d-fc77-4564-b77a-05d2e05d5483 RESTORED
3c51073f-2d43-418c-b468-29b3c2fee160 RESTORED
d2e137f8-86fa-41bf-ace8-864bf1572fc3 RESTORED
I have no snapshot anymore. Just some traces of my restores.
And all is clean, no hidden table:
Summary
YugabyteDB doesn't support transactional DDL yet, but the need to protect a maintenance window full of DDL and DML is achieved with fast Point In Time Recovery. Doing the same with PostgreSQL is not easy and requires a copy-on-write filesystem which impacts the performance. YugabyteDB storage, in SST Files, allows fast snapshots, on each node. This, combined with YugabyteDB implementation of MVCC (Multi-Version Concurrency Control) with Hybrid Logical Clock timestamps, makes it possible to travel in time, to a state that is consistent across all nodes. The MVCC retention, usually set by timestamp_history_retention_interval_sec
for transaction snapshots, is internally increased to cover down to the previous PITR snapshot.
Note: PITR is work in progress, there are currently (2.13) some limitation. They are documented and the roadmap is in PITR: Tracking issue #7120.