Building a Resilient AWS Infrastructure with Terraform

Piyush Bagani - Jul 16 - - Dev Community

Building resilient and scalable infrastructure is critical in today's era, where downtime or poor performance can directly impact customer satisfaction and business revenue. This blog explores the setup of a high availability architecture within Amazon Web Services (AWS) using Terraform, an Infrastructure as Code (IaC) tool. By the end of this guide, you'll understand how to use Terraform to create a fault-tolerant architecture that supports robust, scalable web applications.

Why Use Terraform?

Terraform is a powerful tool for building, changing, and versioning infrastructure safely and efficiently. It supports numerous service providers, including AWS, and allows users to define infrastructure through a high-level configuration language. Terraform shines in multi-cloud and complex system setups, making it an ideal choice for managing sophisticated cloud environments.

Project Setup Overview

We aim to deploy a VPC in AWS with all the necessary components to support a fault-tolerant, scalable web server. This includes:

  • A VPC with separate public and private subnets across multiple Availability Zones.
  • NAT Gateways to provide internet access to instances in private subnets. To improve resiliency NAT gateways(to be deployed in public subnets) lie in both AZs.
  • An Application Load Balancer to distribute incoming traffic evenly.
  • Auto Scaling Groups to handle dynamic scaling based on traffic.
  • For additional security, instances are in private subnets.

The full architecture diagram is shown below:

Image description

Reference link: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-example-private-subnets-nat.html

NOTE: Here we will not provision the S3 gateway.

Prerequisites

  1. Install Terraform
    • Goto this link and install terraform for your operating system.
  2. Access to an AWS account
    • Sign Up for a Free tier AWS Account, most of the items we aim to create will come under the free tier.

So let's get started by provisioning the above-given infrastructure step by step.

Structure of terraform configuration

aws-vpc-subnet-architecture/
├── aws_alb.tf
├── aws_asg.tf
├── aws_networking.tf
├── outputs.tf
├── providers.tf
├── setup.sh
├── terraform.tfvars
└── variables.tf
Enter fullscreen mode Exit fullscreen mode

We will discuss the purpose of each file one by one.

Provisioning the Networking Components:

Here's the code that provisions all the networking components:
aws_networking.tf

# Creating the VPC
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr # value defined in terraform.tfvars
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = merge(
    var.common_tags,
    {
      Name = var.vpc_name
    }
  )
}

# Creating the Internet Gateway
resource "aws_internet_gateway" "internet_gateway" {
  vpc_id = aws_vpc.main.id

  tags = merge(
    var.common_tags,
    {
      Name = "Main Internet Gateway"
    }
  )
}

# Creating the Public Subnets
resource "aws_subnet" "public_subnet" {
  count                   = length(var.public_subnet_cidrs)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = merge(
    var.common_tags,
    {
      Name = "Public Subnet ${count.index + 1}"
    }
  )
}

# Creating the Private Subnets
resource "aws_subnet" "private_subnet" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = merge(
    var.common_tags,
    {
      Name = "Private Subnet ${count.index + 1}"
    }
  )
}

# Creating the NAT Gateways with Elastic IPs
resource "aws_eip" "nat_eip" {
  count  = length(var.availability_zones)
  domain = "vpc"

  tags = merge(
    var.common_tags,
    {
      Name = "NAT EIP ${count.index + 1}"
    }
  )
}

resource "aws_nat_gateway" "nat_gateway" {
  count         = length(aws_subnet.public_subnet)
  allocation_id = aws_eip.nat_eip[count.index].id
  subnet_id     = aws_subnet.public_subnet[count.index].id

  tags = merge(
    var.common_tags,
    {
      Name = "NAT Gateway ${count.index + 1}"
    }
  )
}

# Creating the Route Table for Public Subnet
resource "aws_route_table" "public_route_table" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.internet_gateway.id
  }

  tags = merge(
    var.common_tags,
    {
      Name = "Public Route Table"
    }
  )
}

resource "aws_route_table_association" "public_route_table_association" {
  count          = length(aws_subnet.public_subnet)
  subnet_id      = aws_subnet.public_subnet[count.index].id
  route_table_id = aws_route_table.public_route_table.id
}

# Creating the Route Table for Private Subnet
resource "aws_route_table" "private_route_table" {
  count  = length(aws_subnet.private_subnet)
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat_gateway[count.index].id
  }

  tags = merge(
    var.common_tags,
    {
      Name = "Private Route Table ${count.index + 1}"
    }
  )
}

resource "aws_route_table_association" "private_route_table_association" {
  count          = length(aws_subnet.private_subnet)
  subnet_id      = aws_subnet.private_subnet[count.index].id
  route_table_id = aws_route_table.private_route_table[count.index].id
}

Enter fullscreen mode Exit fullscreen mode

This Terraform script is designed to systematically construct a robust network infrastructure within AWS.

At its core, the script initiates the creation of a Virtual Private Cloud (VPC), with a custom IP address range (CIDR block). The script further enhances network functionality by setting up an Internet Gateway, which is crucial for enabling communication between the VPC and the Internet, thereby facilitating public Internet access for the resources within public subnets.

Moreover, the code proceeds to systematically deploy both public and private subnets. Each Public subnet is configured to have NAT Gateways and Load Balancer which is ideal for front-end interfaces and services that need to interact with external clients. Conversely, private subnets are used for backend systems that require enhanced security by isolating them from direct internet access, thus they rely on NAT Gateways for external connections. NAT Gateways, strategically placed in each public subnet and equipped with Elastic IPs, ensure that instances in private subnets can reach the internet for necessary updates and downloads while remaining hidden from direct inbound internet traffic.

The script also creates route tables with predefined routes to manage the traffic flow: public route tables direct traffic to the internet gateway, allowing resources within public subnets to communicate with the internet, whereas private route tables route internal traffic through the NAT Gateways, safeguarding the private resources.

Finally, the script sets up associations between subnets and their respective route tables, ensuring that each subnet adheres to the correct routing policies for its intended use, whether for exposure to the public internet or protected internal operations.

Provisioning the Auto-Scaling Group and Launch Template:

aws_asg.tf: This file contains the main configuration for infrastructure like ASG and the launch template.

#  Creating Launch Template
resource "aws_launch_template" "app_lt" {
  name          = "app-launch-template"
  image_id      = var.ami_id
  instance_type = var.instance_type
  user_data     = base64encode(file("${path.module}/setup.sh")) # Setup script for web server

  vpc_security_group_ids = [aws_security_group.instance_sg.id]

  tag_specifications {
    resource_type = "instance"
    tags = merge(
      var.common_tags,
      {
        Name = "Instance Template"
      }
    )
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group" "instance_sg" {
  name        = "instance-security-group"
  description = "Security group for instances"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb_sg.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(
    var.common_tags,
    {
      Name = "Instance Security Group"
    }
  )
}

# Creating Auto Scaling Group
resource "aws_autoscaling_group" "app_asg" {

  launch_template {
    id      = aws_launch_template.app_lt.id
    version = "$Latest"
  }

  min_size            = 1
  max_size            = 4
  desired_capacity    = 2
  vpc_zone_identifier = aws_subnet.private_subnet.*.id

  tag {
    key                 = "Name"
    value               = "app-instance-${formatdate("YYYYMMDDHHmmss", timestamp())}"
    propagate_at_launch = true
  }
}
Enter fullscreen mode Exit fullscreen mode
  • This section of the Terraform script orchestrates the automated deployment and management of EC2 instances within an AWS environment, focusing on scalability, security, and configuration efficiency. It involves setting up a Launch Template, a Security Group, and an Auto Scaling Group.

The Launch Template acts as a blueprint for the instances, detailing the Amazon Machine Image (AMI), instance type, and user data, which includes a script for initial setup tasks such as configuring web servers. This template ensures that all instances are uniformly configured as per the defined specifications and is accompanied by a security group that functions as a virtual firewall to regulate inbound and outbound traffic for the instances. It allows inbound HTTP traffic on port 80 from associated load balancers, facilitating access to web services hosted on the instances, while permitting all outbound traffic to ensure seamless external connectivity for updates and API interactions.

The Auto Scaling Group is a critical component that dynamically adjusts the number of instances based on demand. It utilizes the launch template for creating new instances, ensuring they adhere to the predefined configuration. The group is configured to operate within a range of instance counts, automatically scaling up or down between the minimum and maximum limits based on actual load, thus ensuring cost efficiency and resource availability.

Moreover, each instance is tagged with a unique timestamp at creation, enhancing manageability within the AWS ecosystem.

Provisioning the Application Load Balancer:

aws_alb.tf: This file contains the main configuration that helps to deploy the application load balancer.

#  Creating Application Load Balancer (ALB)
resource "aws_lb" "app_lb" {
  name               = "aws-app-prod-lb"
  internal           = false
  load_balancer_type = "application"
  subnets            = aws_subnet.public_subnet.*.id

  security_groups = [aws_security_group.alb_sg.id]

  tags = merge(
    var.common_tags,
    {
      Name = "Application Load Balancer"
    }
  )
}

#  Creating a Security Group for the Load Balancer
resource "aws_security_group" "alb_sg" {
  name        = "alb-security-group"
  description = "Allow web traffic"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(
    var.common_tags,
    {
      Name = "ALB Security Group"
    }
  )
}


#  Creating a Target Group for ALB
resource "aws_lb_target_group" "tg" {
  name     = "aws-target-group"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  health_check {
    enabled             = true
    interval            = 30
    path                = "/"
    protocol            = "HTTP"
    timeout             = 5
    healthy_threshold   = 3
    unhealthy_threshold = 3
  }

  tags = merge(
    var.common_tags,
    {
      Name = "Target Group"
    }
  )
}

# Attaching Target Group to ALB
resource "aws_lb_listener" "front_end" {
  load_balancer_arn = aws_lb.app_lb.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.tg.arn
  }
}

# Attaching Target Group to Auto Scaling Group
resource "aws_autoscaling_attachment" "asg_attachment" {
  autoscaling_group_name = aws_autoscaling_group.app_asg.id
  lb_target_group_arn    = aws_lb_target_group.tg.arn
}
Enter fullscreen mode Exit fullscreen mode

This portion of the Terraform script sets up an Application Load Balancer (ALB), along with its dedicated security group and a target group, to efficiently manage and distribute incoming web traffic across multiple EC2 instances. The ALB is designed to be internet-facing, as indicated by the internal flag set to false, allowing it to handle inbound internet traffic. It operates on HTTP protocol across instances located in public subnets, ensuring that the application can serve requests directly from the internet.

Additionally, a target group is configured to facilitate health checks and manage traffic distribution among instances, ensuring only healthy instances receive traffic. This improves application availability and user experience by optimizing resource use, reducing response times, and increasing uptime. Integrating the target group with both the ALB and an Auto Scaling group allows the system to adjust to traffic changes, enhancing robustness and cost-efficiency dynamically. This setup creates a scalable, fault-tolerant architecture ideal for high-availability web services.

There are a few more important files, that contribute towards successful provisioning of this whole architecture.

  • outputs.tf: This file specifies the outputs of created resources.
# Output for VPC
output "vpc_id" {
  value       = aws_vpc.main.id
  description = "The ID of the VPC"
}

# Output for Public Subnets
output "public_subnet_ids" {
  value       = aws_subnet.public_subnet.*.id
  description = "The IDs of the public subnets"
}

# Output for NAT Gateways
output "nat_gateway_ids" {
  value       = aws_nat_gateway.nat_gateway.*.id
  description = "The IDs of the NAT gateways"
}

# Output for Private Subnets
output "private_subnet_ids" {
  value       = aws_subnet.private_subnet.*.id
  description = "The IDs of the private subnets"
}


# Output for Application Load Balancer
output "alb_dns_name" {
  value       = aws_lb.app_lb.dns_name
  description = "The DNS name of the Application Load Balancer"
}
Enter fullscreen mode Exit fullscreen mode
  • providers.tf: This file specifies the AWS provider and the region where the infrastructure will be provisioned.
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.57.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}
Enter fullscreen mode Exit fullscreen mode
  • variables.tf: This file helps to declare the variables that will be used in the terraform configuration. Some variables have default variables as well.
######################
## Global variables ##  
######################
variable "aws_region" {
  description = "The AWS region to create resources in."
  default     = "ap-south-1"
}

variable "common_tags" {
  default = {
    Project     = "VPC Setup"
    Environment = "Production"
  }
}

variable "vpc_name" {
  type        = string
  description = "The name of the VPC."

}
#####################
## AWS Networking  ##  
#####################

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
}

variable "public_subnet_cidrs" {
  description = "CIDR blocks for public subnets"
  type        = list(string)
  default     = ["10.0.1.0/24", "10.0.2.0/24"]
}

variable "private_subnet_cidrs" {
  description = "CIDR blocks for private subnets"
  type        = list(string)
  default     = ["10.0.3.0/24", "10.0.4.0/24"]
}

variable "availability_zones" {
  description = "Availability zones for subnets"
  type        = list(string)
  default     = ["ap-south-1a", "ap-south-1b"]
}
############################
## AWS Auto-Scaling Group ##  
############################

variable "instance_type" {
  type        = string
  default     = "t2.micro"
  description = "The instance type"
}

variable "ami_id" {
  type        = string
  default     = "ami-0ec0e125bb6c6e8ec"
  description = "The AMI id of AWS Amazon Linux Instance in Mumbai"
}

Enter fullscreen mode Exit fullscreen mode
  • terraform.tfvars: This file helps to define the values of the variables declared in variables.tf.
vpc_cidr = "10.0.0.0/16"
vpc_name = "aws_prod"
Enter fullscreen mode Exit fullscreen mode
  • setup.sh: This file contains user-data that acts as a start-up script for the instances being launched.
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "Hello from $(hostname -f)" > /var/www/html/index.html
Enter fullscreen mode Exit fullscreen mode

So this was it when it came to creating terraform scripts. Further execute the terraform init command to initialize terraform, and then execute the terraform plan to review the infrastructure to be provisioned and execute terraform apply to finally provision it.

terraform init

Output of terraform init

terraform plan

Image description

terraform apply: Finally, type yes when prompted to approve the infrastructure creation.

Output of terraform apply.

If terraform apply runs successfully, it will show the below-given output. It displays the outputs we defined in outputs.tf.

Image description

Now Let's verify the resources on the AWS Cloud Console.

Image description

The above image confirms that the networking components are created properly, aws_prod VPC with 4 subnets, (2 public and 2 private) that also in different AZs, Route tables, NAT Gateways with 2 EIPs, and Internet Gateway have been provisioned.

Image description

Image description

See the SGs

The above image confirms that the auto-scaling group with a launch template and with 2 instances have been provisioned. Also, you can see the instances with dedicated SG are created in different AZs that provide high availability and fault tolerance.

Image description

Image description

The above images confirm that the aws-app-prod-lb ALB with the aws-target-group target group has been provisioned. The 2 instances created as part of ASG are registered targets in this target group. The target group is configured to facilitate health checks and traffic distribution among the instances.

In the above images, one can see the DNS name(A record) of the ALB. If we access this in the browser, we can see that the load-balancing is balancing the load among instances. Below are the images that depict the same.

Image description

Image description

The above images clearly show that we have successfully deployed and configured the webserver securely in a Private instance, We can access it through the Internet using the Application Load Balancer securely.

Conclusion

This Terraform setup provides a robust template for deploying a high-availability architecture in AWS. It ensures that your infrastructure is resilient and adaptable to load changes, making it ideal for enterprises aiming to maximize uptime and performance. The entire infrastructure is codified, which simplifies changes and versioning over time.

By automating infrastructure management with Terraform, organizations can significantly reduce the potential for human errors while enabling faster deployment and scalability. This makes Terraform an indispensable tool in modern cloud environments.

Thank you for reading my blog post, please do share it with your peers.

Keep Learning, Keep Sharing

Reference Links: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-example-private-subnets-nat.html

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .