Continuous Delivery on AWS With Terraform and Travis CI

Frank Rosner - Jul 4 '18 - - Dev Community

This blog post is part of my AWS series:

Introduction

In the previous posts we introduced and extensively used Terraform to automate infrastructure deployments. If you are aiming at true continuous delivery a high degree of automation is crucial. Continuous delivery (CD) is about producing software in short cycles with high confidence, reducing the risk of delivering changes.

In this blog post we want to combine Terraform with an automated build pipeline on Travis CI. In order to use Terraform in a shared setting we have to configure it to use remote state, as local state cannot be used for any project which involves multiple developers or automated build pipelines. The application we are deploying will be a static website generated by Jekyll.

The remainder of the post is structured as follows. In the first section we will briefly discuss the overall solution architecture, putting the focus on the continuous deployment infrastructure. The next section is going to elaborate on two solutions how to provision the remote state resources using Terraform. Afterwards there will be a walk through the implementation of the remote state bootstrapping, the static website deployment, and the automation using Travis CI. We are closing the blog post by summarizing the main ideas.

Architecture

architecture overview

The above figure visualizes the solution architecture including the components for continuous integration (CI) and CD. The client is the developer in this case as we are looking at the setup from the development point of view.

As soon as a developer pushes new changes to the remote GitHub repository it triggers a Travis CI build. Travis CI is a hosted build service that is free to use for open source projects. Travis then builds the website artifacts, deploys the infrastructure, and pushes the artifacts to production.

We are using an S3 backend with DynamoDB for Terraform. Terraform will store the state within S3 and use DynamoDB to acquire a lock while performing changes. The lock is important to avoid that two Terraform binaries are modifying the same state concurrently.

To use the S3 remote state backend we need to create the S3 bucket and DynamoDB table beforehand. This bootstrapping is also done and automated with Terraform. But how do we manage infrastructure with Terraform that is required to use Terraform? The next section will discuss two approaches to solve this 🐔 & 🥚 problem.

Remote State Chicken And Egg Problem

How can we use Terraform to setup the S3 bucket and DynamoDB table we want to use for the remote state backend? First we create the remote backend resources with local state. Then we somehow need to share this state to allow modifications of the backend resources later on. From what I can tell there are two viable solutions to do that:

  1. Shared local state. Commit local state to your version control and share it in a remote repository.
  2. Migrated state. Migrate local state to remote state backend.

Both solutions involve creating the remote state resources using local state. They differ in the way how the state for provisioning the remote state resources is shared. While the first option is easy to setup there are two major risks that need to be taken into account:

  • Terraform state might contain secrets. In the case of only the S3 bucket and DynamoDB table there is only one variable which might be problematic: The AWS access key. If you are working with a private repository, this might not be a huge issue. When working on open source code it might be useful to encrypt the state file before committing it. You can do this with OpenSSL or more specialized tools like Ansible Vault.
  • Shared local state has no locking or synchronization mechanism. When publishing your local Terraform state to the remote source code repository you have to manually make sure to keep this state file in sync with all developers. If someone is making modifications to the resources he or she has to commit and push the updated state file and make sure that no one else is modifying the infrastructure at the same time.

The second option is a bit safer with regards to the above-mentioned issues. S3 supports encryption at rest out of the box and you can have fine granular access control on the bucket. Also if DynamoDB is used for locking, two parties cannot modify the resources concurrently. The disadvantage is that the solution is more complex.

After we migrate the local state to the created remote state backend, it will contain the state for the backend itself plus the application infrastructure state. Luckily Terraform provides a built-in way to isolate state of different environments: Workspaces. We can create a separate workspace for the backend resources to avoid interference between changes in our backend infrastructure and application infrastructure.

Working with workspaces is a bit difficult to wrap your head around so we are going to tackle this option in the course of this post to get it to know in detail. In practice I am not sure if the increased complexity is worth the effort, especially as you usually do not touch the backend infrastructure unless you want to shut down the project. The next section will explain the bootstrapping and application implementation and deployment step by step.

Implementation

Development Tool Stack

To develop the solution we are using the following tools:

  • Terraform v0.11.7
  • Jekyll 3.8.3
  • Git 2.15.2
  • IntelliJ + Terraform Plugin

The source code is available on GitHub. Now let's look into the implementation details of each component.

Remote State Bootstrapping And Configuration

We will organize our Terraform files in workspaces and folders. Workspaces isolate the backend resource state from the application resource state. Folders will be used to organize the Terraform resource files.

We will create two workspaces: state and prod. The state workspace will manage the remote state resources, i.e. the S3 bucket and the DynamoDB table. The prod workspace will manage the production environment of our website. You can add more workspaces for staging or testing later but this is beyond the scope of this blog post.

We will create three folders containing Terraform files: bootstrap, backend, and website. The next listing outlines the directory and file structure of the project.

.
├── locals.tf
├── providers.tf
├── backend
│   ├── backend.tf
│   ├── backend.tf.tmpl
│   ├── locals.tf -> ../locals.tf
│   ├── providers.tf -> ../providers.tf
│   └── state.tf -> ../bootstrap/state.tf
├── bootstrap
│   ├── locals.tf -> ../locals.tf
│   ├── providers.tf -> ../providers.tf
│   └── state.tf
└── website
    ├── backend.tf -> ../backend/backend.tf
    ├── locals.tf -> ../locals.tf
    ├── providers.tf -> ../providers.tf
    └── website.tf
Enter fullscreen mode Exit fullscreen mode

The project root will contain a shared AWS provider configuration providers.tf, as well as a project name variable inside locals.tf. We will go into details about the file contents later.

In addition to the shared files bootstrap contains state.tf, which defines the S3 bucket and DynamoDB table backend resources. We share them across folders using symbolic links. The backend folder will have the same resources but uses the already present S3 backend defined in backend.tf. When switching from bootstrap to backend after the initial provisioning, Terraform will migrate the local state to the remote backend.

The website folder contains the remote backend configuration and all resources related to the actual website deployment. We will access backend and bootstrap from the state workspace and website from prod and any other additional workspace related to the application.

The next listing shows what the bootstrap/state.tf file looks like. The project_name local variable is defined within the shared locals.tf file. The current aws_caller_identity and aws_region are defined within the shared providers.tf file.

# state.tf

locals {
  state_bucket_name = "${local.project_name}-${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}"
  state_table_name = "${local.state_bucket_name}"
}

resource "aws_dynamodb_table" "locking" {
  name           = "${local.state_table_name}"
  read_capacity  = "20"
  write_capacity = "20"
  hash_key       = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

resource "aws_s3_bucket" "state" {
  bucket = "${local.state_bucket_name}"
  region = "${data.aws_region.current.name}"

  versioning {
    enabled = true
  }

  server_side_encryption_configuration {
    "rule" {
      "apply_server_side_encryption_by_default" {
        sse_algorithm = "AES256"
      }
    }
  }

  tags {
    Name = "terraform-state-bucket"
    Environment = "global"
    project = "${local.project_name}"
  }
}

output "BACKEND_BUCKET_NAME" {
  value = "${aws_s3_bucket.state.bucket}"
}

output "BACKEND_TABLE_NAME" {
  value = "${aws_dynamodb_table.locking.name}"
}
Enter fullscreen mode Exit fullscreen mode

Here we define the S3 bucket and enable encryption as well as versioning. Encryption is important because Terraform state might contain secret variables. Versioning is highly recommended to be able to roll back in case of accidental state modifications.

We also configure the DynamoDB table which is used for locking. Terraform uses an attribute called LockID so we have to create it and make it the primary key. When using DynamoDB without auto scaling you have to specify a maximum read and write capacity before request throttling kicks in. To be honest I think you should go with the minimum here.

We can now create the state workspace and start bootstrapping with local state:

  • terraform workspace new state
  • terraform init bootstrap
  • terraform apply bootstrap

After the S3 bucket and DynamoDB table are created we will migrate the local state. This is done by initializing the state resources with the newly created remote backend. Before we can proceed however we need to include the BACKEND_BUCKET_NAME and BACKEND_TABLE_NAME variables into backend/backend.tf. I did it by generating the file using envsubst and backend/backend.tf.tmpl:

# backend.tf.tmpl

terraform {
  backend "s3" {
    bucket         = "${BACKEND_BUCKET_NAME}"
    key            = "terraform.tfstate"
    region         = "eu-central-1"
    dynamodb_table = "${BACKEND_TABLE_NAME}"
  }
}
Enter fullscreen mode Exit fullscreen mode

Now let's initialize the remote backend resources to migrate the local state.

$ terraform init backend

Initializing the backend...
Do you want to migrate all workspaces to "s3"?
  Both the existing "local" backend and the newly configured "s3" backend support
  workspaces. When migrating between backends, Terraform will copy all
  workspaces (with the same names). THIS WILL OVERWRITE any conflicting
  states in the destination.

  Terraform initialization doesn't currently migrate only select workspaces.
  If you want to migrate a select number of workspaces, you must manually
  pull and push those states.

  If you answer "yes", Terraform will migrate all states. If you answer
  "no", Terraform will abort.

  Enter a value: yes


Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Enter fullscreen mode Exit fullscreen mode

That's it! We created the remote state backend using local state and migrated the local state afterwards. Next we will deploy some actual application resources using the remote state backend.

Static Website

I chose a static webpage as an example application for this post. The reason is that the main focus lies in automation and working with remote state so this one will be kept rather simple. The website is generated using Jekyll and the source code is stored in website/static.

To make the website publicly available we will use another S3 bucket and configure it to display files as a website. Here is the the configuration of the bucket within website/website.tf.

# website.tf

locals {
  website_bucket_name = "${local.project_name}-${terraform.workspace}-website"
}

resource "aws_s3_bucket" "website" {
  bucket = "${local.website_bucket_name}"
  acl    = "public-read"
  policy = <<POLICY
{
    "Version":"2012-10-17",
    "Statement":[
      {
        "Sid":"PublicReadGetObject",
        "Effect":"Allow",
        "Principal": "*",
        "Action":["s3:GetObject"],
        "Resource":["arn:aws:s3:::${local.website_bucket_name}/*"]
      }
    ]
}
POLICY

  website {
    index_document = "index.html"
    error_document = "error.html"
  }

  tags {
    Environment = "${terraform.workspace}"
  }
}
Enter fullscreen mode Exit fullscreen mode

We configure the bucket to be publicly readable using the appropriate ACL and policy. We can setup website hosting using the website stanza. The index_document will be served when no specific resource is requested, while the error_document is used if the requested resource does not exist.

Next we have to specify the HTML and CSS files. This is a bit cumbersome as we cannot tell Terraform to upload a whole folder structure. We will also output the URL which can be used to access the website in the end.

# website.tf

locals {
  site_root = "website/static/_site"
  index_html = "${local.site_root}/index.html"
  about_html = "${local.site_root}/about/index.html"
  post_html = "${local.site_root}/jekyll/update/2018/06/30/welcome-to-jekyll.html"
  error_html = "${local.site_root}/404.html"
  main_css = "${local.site_root}/assets/main.css"
}

resource "aws_s3_bucket_object" "index" {
  bucket = "${aws_s3_bucket.website.id}"
  key    = "index.html"
  source = "${local.index_html}"
  etag   = "${md5(file(local.index_html))}"
  content_type = "text/html"
}

resource "aws_s3_bucket_object" "post" {
  bucket = "${aws_s3_bucket.website.id}"
  key    = "jekyll/update/2018/06/30/welcome-to-jekyll.html"
  source = "${local.post_html}"
  etag   = "${md5(file(local.post_html))}"
  content_type = "text/html"
}

resource "aws_s3_bucket_object" "about" {
  bucket = "${aws_s3_bucket.website.id}"
  key    = "about/index.html"
  source = "${local.about_html}"
  etag   = "${md5(file(local.about_html))}"
  content_type = "text/html"
}

resource "aws_s3_bucket_object" "error" {
  bucket = "${aws_s3_bucket.website.id}"
  key    = "error.html"
  source = "${local.error_html}"
  etag   = "${md5(file(local.error_html))}"
  content_type = "text/html"
}

resource "aws_s3_bucket_object" "css" {
  bucket = "${aws_s3_bucket.website.id}"
  key    = "assets/main.css"
  source = "${local.main_css}"
  etag   = "${md5(file(local.main_css))}"
  content_type = "text/css"
}

output "url" {
  value = "http://${local.website_bucket_name}.s3-website.${aws_s3_bucket.website.region}.amazonaws.com"
}
Enter fullscreen mode Exit fullscreen mode

Before we deploy the changes we should create a new workspace. The state workspace will only be used in case we need to make modifications to the remote state backend resources. We'll call the new workspace prod and use it to initialize and deploy the website resources.

  • terraform workspace new prod
  • terraform init website
  • cd website/static && jekyll build && cd -
  • terraform apply website
  • 🎉🎉🎉

website deployed

"But what about continuous delivery", I hear you ask? The following section is going to cover setting up the automated Travis job.

Travis Job

To use Travis CI we have to provide a build configuration file called .travis.yml. Simply put it tells the build server which commands to execute. Here is what we are going to do:

# .travis.yml

language: generic

install:
  - gem install bundler jekyll

script:
  - ./build.sh
Enter fullscreen mode Exit fullscreen mode

The build.sh file contains the actual logic. While it is possible to put all the commands in the YAML file directly, it is a bit clumsy. The following listing contains the contents of the build script. Note that we committed the Terraform Linux binary inside the repository so we do not have to download it on every build and make sure to have the correct version.

# build.sh

cd website/static
bundle install
bundle exec jekyll build
cd -

./terraform-linux init
./terraform-linux validate website

if [[ $TRAVIS_BRANCH == 'master' ]]
then
    ./terraform-linux workspace select prod
    ./terraform-linux apply -auto-approve website
fi
Enter fullscreen mode Exit fullscreen mode

Notice that we are only deploying the changes to production from the master branch. On other branches Terraform only validates the syntax and checks that all required variables are defined.

To enable the Terraform binary to talk to AWS from within the build server, we also need to setup AWS credentials. This can be done by setting up secret environment variables in the build settings:

travis environment variables

Then we only have to enable the repository on the Travis dashboard and trigger a build either by pushing a commit or using the UI. If everything works as expected you will receive a green build:

green build

Conclusion

In this post we have seen how to use Terraform in an automated setting for continuous deployment. Using a combination of the AWS remote state backend and workspaces, we were able to solve the chicken and egg problem when provisioning the remote state resources. We then deployed a Jekyll-generated static website using S3.

In my opinion however, the solution with the state migration and the all the symbolic links is rather complex. If possible I would probably go for local state only and store it directly within the repository. What do you think? Did you ever use remote state with Terraform? How did you provision it? Let me know in the comments.


If you liked this post, you can support me on ko-fi.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .