Originally Posted on medium.com
Migrating data from relational databases to cloud-based data lakes is a common task in modern data architectures. Amazon Web Services (AWS) offers a robust solution called AWS Database Migration Service (DMS) that simplifies this process. This service is particularly useful for incremental migrations, allowing you to replicate ongoing changes in your source databases to a target destination like Amazon S3.
In this article, we’ll set up an incremental migration from an Amazon RDS database to an S3 bucket using Terraform.
Why Use AWS DMS?
AWS DMS provides several advantages:
Minimal Downtime: Supports continuous data replication, which is essential for reducing downtime during migrations.
Versatility: Supports various source and target database engines, making it flexible for different use cases.
Ease of Use: Simple to set up and monitor, with a pay-as-you-go pricing model.
Architecture Overview
The migration setup involves:
- Source Database: An Amazon RDS instance (e.g., MySQL, PostgreSQL).
- Target Data Store: An Amazon S3 bucket where the data will be stored.
- AWS DMS: The service used to perform the migration, including replication instances, endpoints, and tasks.
Setting Up the Environment with Terraform
Terraform is an Infrastructure as Code (IaC) tool that allows you to define and provision infrastructure using a high-level configuration language. Here’s how to set up the environment:
Step 1: Define Provider and Variables
First, set up your provider and variables in a main.tf file:
provider "aws" {
region = "us-west-2"
}
variable "rds_instance_id" {
description = "The RDS instance identifier"
}
variable "s3_bucket_name" {
description = "The name of the S3 bucket"
}
Step 2: Create the S3 Bucket
resource "aws_s3_bucket" "target_bucket" {
bucket = var.s3_bucket_name
acl = "private"
versioning {
enabled = true
}
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
}
Step 3: Define IAM Role for DMS
AWS DMS needs an IAM role with the necessary permissions to access the RDS instance and the S3 bucket:
resource "aws_iam_role" "dms_role" {
name = "dms-access-role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Action = "sts:AssumeRole",
Effect = "Allow",
Principal = {
Service = "dms.amazonaws.com"
}
}]
})
}
resource "aws_iam_policy" "dms_policy" {
name = "dms-access-policy"
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Action = [
"s3:*",
"rds:*",
"dms:*"
],
Effect = "Allow",
Resource = "*"
}]
})
}
resource "aws_iam_role_policy_attachment" "dms_attach_policy" {
role = aws_iam_role.dms_role.name
policy_arn = aws_iam_policy.dms_policy.arn
}
Step 4: Define DMS Endpoints
Create the source and target endpoints for DMS:
resource "aws_dms_endpoint" "source_endpoint" {
endpoint_id = "rds-source-endpoint"
endpoint_type = "source"
engine_name = "mysql"
username = "your-db-username"
password = "your-db-password"
server_name = "your-db-endpoint"
port = 3306
database_name = "your-database-name"
}
resource "aws_dms_endpoint" "target_endpoint" {
endpoint_id = "s3-target-endpoint"
endpoint_type = "target"
engine_name = "s3"
s3_settings {
bucket_name = aws_s3_bucket.target_bucket.bucket
bucket_folder = "dms-data"
compression_type = "gzip"
}
}
Step 5: Create a DMS Replication Instance
The replication instance handles the migration process:
resource "aws_dms_replication_instance" "replication_instance" {
replication_instance_id = "dms-replication-instance"
replication_instance_class = "dms.t2.micro"
allocated_storage = 100
publicly_accessible = true
apply_immediately = true
}
Step 6: Create a DMS Replication Task
Define the task that will perform the migration:
resource "aws_dms_replication_task" "replication_task" {
replication_task_id = "rds-to-s3-task"
migration_type = "full-load-and-cdc"
table_mappings = file("table-mappings.json")
replication_task_settings = file("task-settings.json")
source_endpoint_arn = aws_dms_endpoint.source_endpoint.endpoint_arn
target_endpoint_arn = aws_dms_endpoint.target_endpoint.endpoint_arn
replication_instance_arn = aws_dms_replication_instance.replication_instance.replication_instance_arn
}
The table-mappings.json file defines which tables to migrate, and the task-settings.json file contains task-specific settings like logging and error handling.
Configuring the Incremental Migration
To enable incremental migration (also known as Change Data Capture or CDC), the DMS task must be configured to continuously capture changes from the source database after the initial full load. This is controlled by the migration_type parameter, set to "full-load-and-cdc" in the DMS replication task.
Ensure that the source database is configured to support CDC, which may involve enabling binary logging in MySQL or a similar mechanism in other database engines.
Monitoring and Managing the Migration
AWS DMS provides several metrics and logs that you can use to monitor the migration process. You can access these metrics in the AWS Management Console or set up CloudWatch alarms to notify you of any issues.
Thank you for reading, if you have anything to add please send a response or add a note!
Happy migrating! 🚀