The problem
A common use case for AWS accounts is the creation of ephemeral platforms.
Usually for development or integration environment, we want to optimize cost and therefore shutdown services when they are not needed.
In our case, this process is managed by Terraform, which for example create/destroy the platform based on a schedule of our CI/CD tool.
The database problem
But by the nature of the platform concerned by this create/destroy cycle, customers often want their database to be filled with test data that helps them run their integration/functional tests.
By chance, RDS offers a function to Snapshot the database just before deletion, and you can use it in the next platform creation iteration to restore it.
And this is where things are gonna get messy.
The Terraform RDS resource problem
Let's check how AWS RDS resource works with Terraform.
You have two options to create an RDS instance:
- Without a Snapshot
resource "aws_db_instance" "dbname" {
allocated_storage = 10
identifier = "db-instance-id"
db_name = "dbname"
engine = "postgres"
engine_version = data.aws_rds_engine_version.pg_version.version
instance_class = "db.t3.micro"
username = "adminuser"
password = random_password.admin.result
skip_final_snapshot = false
final_snapshot_identifier = "${terraform.workspace}-${formatdate("YYYYMMDDhhmmss", timestamp())}"
storage_encrypted = true
backup_retention_period = 5
backup_window = "07:00-09:00"
maintenance_window = "Tue:05:00-Tue:07:00"
vpc_security_group_ids = [
aws_security_group.allow_postgres[0].id
]
db_subnet_group_name = var.subnet_db_name
}
- With a Snapshot
resource "aws_db_instance" "dbname" {
identifier = "db-instance-id"
db_name = "dbname"
instance_class = "db.t3.micro"
skip_final_snapshot = false
final_snapshot_identifier = "${terraform.workspace}-${formatdate("YYYYMMDDhhmmss", timestamp())}"
snapshot_identifier = data.aws_db_snapshot.latest_snapshot.id
storage_encrypted = true
backup_retention_period = 5
backup_window = "07:00-09:00"
maintenance_window = "Tue:05:00-Tue:07:00"
vpc_security_group_ids = [
aws_security_group.allow_postgres[0].id
]
db_subnet_group_name = var.subnet_db_name
}
As we can see, the same resource is not configured in the same way, whether there is the snapshot_identifier
property or not.
Before the first Terraform destroy
Before the first Terraform destroy process, there is no Snapshot available to restore from, so the first applies should be configured with the first definition above, but after the first destroy, a snapshot becomes available, and the RDS resource should be configured with the second definition above.
Can we make come up with a RDS definition that works in all cases ?
Turns out we can, with a little bit of Terraform tricks.
The solution
Terraform Data should point to existing resources
The first thing to note is that the snapshot identifier to restore from comes from a Terraform data source :
snapshot_identifier = data.aws_db_snapshot.latest_snapshot.id
which is defined like this :
data "aws_db_snapshot" "latest_snapshot" {
db_instance_identifier = "db-instance-id"
most_recent = true
}
But, by its nature, Terraform cannot read data that don't exists without complete failure of the Terraform process, so we will need to read the snapshot id data only if a snapshot already exists.
Reading data only if it exists
We need to check first if a snapshot exists, before reading it with terraform.
So we make the following changes:
data "external" "rds_final_snapshot_exists" {
program = [
"./check-rds-snapshot.sh",
"db-instance-${terraform.workspace}"
]
}
data "aws_db_snapshot" "latest_snapshot" {
count = data.external.rds_final_snapshot_exists.result.db_exists ? 1 : 0
db_instance_identifier = "db-instance-id"
most_recent = true
}
And the content of the check-rds-snapshot.sh
script :
#!/bin/bash
db_id=$1
if [ -z ${db_id} ]; then
echo "usage : $0 <db_id>" >2
exit 1
fi
RESULT=($(aws rds describe-db-snapshots --db-instance-identifier $db_id --output text 2> /dev/null))
aws_result=$?
if [ ${aws_result} -eq 0 ] && [[ ${RESULT[0]} == "DBSNAPSHOTS" ]]; then
result='true'
else
result='false'
fi
jq -n --arg exists ${result} '{"db_exists": $exists }'
The external data source checks with the AWS CLI if the snapshot exists, and the count argument on the snapshot data source prevents Terraform from reading its value if none exists.
Now, we only need to combine the two RDS declaration to make it works every time, !
resource "aws_db_instance" "dbname" {
allocated_storage = 10
identifier = "db-instance-id"
db_name = "dbname"
engine = "postgres"
engine_version = data.aws_rds_engine_version.pg_version.version
instance_class = "db.t3.micro"
username = "adminuser"
password = random_password.admin.result
skip_final_snapshot = false
final_snapshot_identifier = "${terraform.workspace}-${formatdate("YYYYMMDDhhmmss", timestamp())}"
snapshot_identifier = try(data.aws_db_snapshot.latest_snapshot.0.id, null)
storage_encrypted = true
backup_retention_period = 5
backup_window = "07:00-09:00"
maintenance_window = "Tue:05:00-Tue:07:00"
vpc_security_group_ids = [
aws_security_group.allow_postgres[0].id
]
db_subnet_group_name = var.subnet_db_name
lifecycle {
ignore_changes = [
snapshot_identifier,
final_snapshot_identifier
]
}
}
And there you have it, a Terraform configuration that create RDS database and restore the latest snapshot if it exists.
Thanks for reading! I’m Xavier, Cloud Developer at Stack Labs.
If you want to join an enthusiast Dev cloud team, please contact us.