Infrastructure as Code (IaC) is the process that enables engineers to define, configure and manage Infrastructure through code. This approach treats underlying infrastructure components, such as networks, virtual machines, storage, databases, and others, in the same way, developers treat their application code.
When it comes to managing Cloud resources, Terraform is the de-facto standard. Terraform is an open-source IaC tool that allows engineers to define their infrastructure resources as code, using a declarative programming language called Hashicorp Configuration Language (HCL). It is provider agnostic, meaning that it supports a large number of Cloud providers (AWS, Azure, GCP, OCI, and many more), and the language in which you are writing the code doesn't change from one provider to another.
By using Terraform, you are not limited to Cloud providers, as you can write Terraform automations for Kubernetes, Helm, Artifactory, Aviatrix, and Spacelift, to name a few. You can even define your own provider → as long as your product exposes an API, a Terraform provider can be built on top of it.
ClickOps vs IaC
ClickOps is the term used to describe the manual management of IT infrastructure from the UI, by clicking into the portal to achieve a desired behavior (create/edit/delete resources). One may argue this process is enough for a small architecture that needs to be deployed, but there actually are some big issues with starting like this: you cannot easily scale and replicate your configuration.
Imagine you are creating an EC2 Instance inside AWS. By using ClickOps, you are going to create it way faster than you would normally do it through Infrastructure as Code. But what happens if you need to create 10 EC2 instances? What about 100 EC2 instances?
Let's suppose it takes an engineer approximately two minutes to create an EC2 through the portal. For 100 instances, this will take a little over three hours, and maybe some can be ok with that. These 100 instances will also need to reside in a network, they will require security groups, maybe some EBS storage, and other things that will again take a lot of time to configure.
Doing this manually is very error-prone, as our attention span cannot keep up with the large number of things that we have to do. By using Terraform, you can easily define all of these components as code, validate the code, plan to see what is going to happen, and in the end, deploy all resources in one go. Apart from that, you can easily scale and replicate your configuration without spending too much time.
Key Terraform Components
Terraform is stateful, tracking the state of the infrastructure by comparing the current defined IaC configuration and a state file it generates after a Terraform run. This introduces a layer of complexity because you need to manage this state file.
If organizations use a combination of IaC and ClickOps, they will introduce Drift and sometimes even break their infrastructure resources. So if you are getting into IaC, and you should really do, forget about ClickOps.
Providers
In essence, Terraform providers function as plugins that enable Terraform to communicate with specific infrastructure resources. These providers serve as a bridge between Terraform and the underlying infrastructure, converting Terraform configurations into relevant API calls and permitting Terraform to handle resources across numerous environments.
Example provider:
provider "aws" {
region = "us-east-1"
}
Take a look at Terraform Providers Overview.
Resources
In Terraform, resources represent the infrastructure elements that can be managed, such as virtual machines, virtual networks, DNS entries, pods, and more.
Each resource is identified by a specific type, like "aws_instance", or "kubernetes_pod," and possesses a range of configurable attributes, such as instance size or type. They are the building blocks of Terraform, and every configuration that will create a piece of the infrastructure has to contain a resource.
Example resource:
resource "azurerm_resource_group" "this" {
name = "rg1"
location = "West Europe"
}
Variables
Terraform variables are similar to variables in any other programming language. They are used to better organize your code, make it easier to change its behavior, and make your configuration reusable. In Terraform, variables can have the following types: string, number, bool, list, set, map, object, and null.
You cannot use variables to hold expression values built from resources or data sources. For that, you should use a local variable.
Example Variable:
variable "vpc_name" {
description = "name of the vpc"
type = string
default = "vpc1"
}
Outputs
An output serves as a convenient method for displaying the value of a particular data source, resource, local, or variable once Terraform has completed implementing infrastructure modifications. They can also be used to expose attributes from a Terraform module.
Example Output:
output "vpc_name" {
value = var.vpc_name
}
DataSources
A data source is an object that retrieves data from an external source and can be used in resources as arguments when they are created or updated, or you can use them in locals to manipulate the data you receive.
Example Datasource:
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu*"]
}
}
Locals
In Terraform, a local variable assigns a name to an expression and makes it easy for the user to utilize it. Typically, in any programming language, when you have a complex expression, you don't want to write it a million times throughout your code. You are usually defining a variable that holds it or a function that implements it.
Local variables work similarly to that, and they can be used in conjunction with resource and datasource attributes to build complex expressions.
Example Local:
locals {
a_list = [for i in range(10) : i]
}
Provisioners
Provisioners exist inside of a resource, and they are used to either run a local command or a remote command or to copy a file from your local environment to a remote virtual machine.
They are considered a last resort option due to the fact they are not a part of Terraform's declarative model. You should typically use cloud-init to run different scripts on your vm during the bootstrap phase, or if you can, use a configuration management tool like Ansible for this.
Example Provisioner:
resource "null_resource" "this" {
provisioner "local-exec" {
command = "ls -l"
}
}
Learn more about Terraform Provisioners and why you should avoid them.
Note: New versions of Terraform will be placed under the BUSL license, but everything created before version 1.5.x stays open-source. OpenTofu is an open-source version of Terraform that will expand on Terraform's existing concepts and offerings. It is a viable alternative to HashiCorp's Terraform, being forked from Terraform version 1.5.6. OpenTofu retained all the features and functionalities that had made Terraform popular among developers while also introducing improvements and enhancements. OpenTofu works with your existing Terraform state file, so you won't have any issues when you are migrating to it.
💡 You might also like:
- Managing Application Load Balancer (ALB) with Terraform
- How to Deploy Helm Charts with Argo CD
- What is Terragrunt?
Minimal Structure of a Terraform Project
A basic structure of a Terraform project, contains the following files:
.
├── main.tf
├── variables.tf
├── outputs.tf
In main.tf typically contains the core resource declarations and providers. This is the place in which you would define your EC2 instance configurations and how to authenticate to the AWS provider.
The variables.tf file define the input variables that allow engineers to customize the project. They are referenced in the main.tf and can easily change the behavior of a particular resource, making this easy to reuse the automations.
In the outputs.tf file, engineers define the outputs that are returned to the console after a terraform apply. As a best practice, they shouldn't return all the data from a resource, only the important fields that are of interest.
Documentation is really important, so having a README.md file inside your Terraform repository that explains how to use the automation (including descriptions of variables and outputs) really help in understanding what has been implemented. To easily generate the description of variables and outputs, you can leverage tfdocs.
This is just a basic structure, but it can be customized depending on the complexity of the automation and, of course, the requirements of the organizations.
Reasons to Choose Terraform for Your IaC
All the resources presented above are the building blocks for creating Terraform automations. The declarative syntax that Terraform offers is easy to understand and use and enables engineers to get up to speed quickly with these concepts.
Terraform's init/plan/apply workflow help prevent unintended changes to your infrastructure by previewing changes before they are actually made. The initialization part also ensures the required providers are downloaded, so you don't need to push them to your vcs repository to be able to take advantage of them.
Nowadays, more and more people are using Terraform. The community is large and very active, and there are a lot of resources available, like documentation and tutorials, that can be easily leveraged.
The code you build with Terraform can be packaged as a module and shared across your organization easily. There are many other features available that keep your code DRY, such as for_each, count, functions, ternary operators, loops, and dynamic blocks.
Due to the fact that it's cloud agnostic, you can use Terraform to build multi-cloud automations without having to use multiple tools together to achieve this.
Example Kubernetes Deployment on Azure Using Terraform
The example code can be found here.
In the above example, we are using two resources:
- azurerm_resource_group → used to create resource groups inside of Azure
- azurerm_kubernetes_cluster → used to create the Kubernetes cluster inside of Azure
For this example, the cluster management will always be free, you will need to pay only for the underlying nodes of the cluster.
On both of the resources, we are using for_each in order to be able to create how many resources of that particular type we want.
resource "azurerm_resource_group" "this" {
for_each = var.resource_groups
name = each.key
location = each.value.location
}
resource "azurerm_kubernetes_cluster" "this" {
for_each = var.kube_params
name = each.key
location = azurerm_resource_group.this[each.value.rg_name].location
resource_group_name = azurerm_resource_group.this[each.value.rg_name].name
...
}
If you look at the above resources, you are going to see how the link between them is created on the Kubernetes one, specifically at the location and resource_group_name parameters. Due to the way it's done, we are ensuring the resource group is created first, and we are accessing its location and name attributed inside of the Kubernetes Cluster.
In that way, the cluster and the resource group in which the cluster will be created will reside in the same location.
To declare the variables, we are using map(object) types in order to take advantage of the full capabilities of the for_each, and we are also ensuring optional values to make the code easier to use.
variable "kube_params" {
description = "AKS parameters"
type = map(object({
rg_name = string
dns_prefix = string
np_name = string
tags = optional(map(string), {})
vm_size = optional(string, "Standard_B2s")
client_id = optional(string, null)
client_secret = optional(string, null)
enable_auto_scaling = optional(bool, false)
max_count = optional(number, 1)
....
To keep it simple, we are providing the values of these variables in the defaults block, but you can use terraform.tfvars or a *.auto.tfvars file, or environment variables to pass these values.
You can even use a local variable to specify the values, or if you want to take it to the next level, you can use a local variable that reads the contents of a YAML file with values by using the file and yamldecode functions.
Right now, with the default values, this looks like this:
default = {
rg1 = {
location = "westus"
}
}
default = {
aks1 = {
rg_name = "rg1"
dns_prefix = "kube"
np_name = "np1"
}
}
For this configuration, we have declared two outputs, one will show a map containing name and location pairs for the resource groups and the other one will have some details related to the Kubernetes cluster in the following format name => id, fqdn.
output "resource_groups" {
description = "Resource Group Outputs"
value = { for rg in azurerm_resource_group.this : rg.name => rg.location }
}
output "aks" {
description = "AKS Outputs"
value = { for kube in azurerm_kubernetes_cluster.this : kube.name => { "id" : kube.id, "fqdn" : kube.fqdn } }
}
In order to run this code we have to go and initialize the working directory by using terraform init
.
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/azurerm...
- Installing hashicorp/azurerm v3.49.0...
- Installed hashicorp/azurerm v3.49.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
After that, we can apply the code using the terraform apply
command.
azurerm_resource_group.this["rg1"]: Creating...
azurerm_resource_group.this["rg1"]: Creation complete after 3s [id=/subscriptions/subid/resourceGroups/rg1]
azurerm_kubernetes_cluster.this["aks1"]: Creating...
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [10s elapsed]
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [20s elapsed]
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [30s elapsed]
...
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [3m50s elapsed]
azurerm_kubernetes_cluster.this["aks1"]: Creation complete after 3m52s [id=/subscriptions/subid/resourceGroups/rg1/providers/Microsoft.ContainerService/managedClusters/aks1]
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
Outputs:
aks = {
"aks1" = {
"fqdn" = "kube-pocuwyy7.hcp.westus.azmk8s.io"
"id" = "/subscriptions/subid/resourceGroups/rg1/providers/Microsoft.ContainerService/managedClusters/aks1"
}
}
resource_groups = {
"rg1" = "westus"
}
You can see the cluster now into the Azure Portal:
Enhance Your IaC Workflow with Spacelift
Terraform is really powerful, but in order to achieve an end-to-end secure Gitops approach, you need to use a product that can run your Terraform workflows.
Enter Spacelift. Not only Spacelift takes care of your Terraform workflows, but it can also help build workflows for Kubernetes, Pulumi, and CloudFormation. Spacelift is GitOps Native, and by also taking advantage of stack dependencies, you can build really sophisticated workflows.
Your workflows will most likely require policies to ensure the necessary guardrails for your infrastructure. Apart from that, taking advantage of notifications when something goes wrong is really important, so taking advantage of Spacelift's built-in features will totally help.
Taking advantage of integrations with major cloud providers avoids using static credentials, which can be easily replicated if you are not careful with them.
Integrating security tools in your workflows can be done easily by using Custom Inputs. With this feature, not only are you integrating the tools, but you can easily run policies on it to ensure engineers are not introducing vulnerabilities with their code.
If Terraform modules make your code DRY, check out Spacelift's Blueprints feature, which really takes reusability to the next level.
Let's reuse the above example and create a stack for it in Spacelift. We will also apply a policy that ensures that people are not changing the size of the VM. I would suggest creating your own repository that holds the above code to integrate everything.
First, go to stacks and select create a new stack. Add a name for the stack, and optionally you can add labels and a description.
After that click continue, and on the Integrate VCS tab, select the repository, and you can leave everything else as a default.
In the configure backend tab, you can select the backend (in our case, it will be Terraform), the Terraform version, whether or not Spacelift manages your state, and if you want smart sanitization enabled.
Select continue and in the define behavior tab, let's leave everything as a default.
Now your stack has been created. You need to handle the authentication to Azure before starting to run the stack per se. There are multiple ways in which this can be done, and Spacelift's documentation is thoroughly explaining this.
After you are done with this, you can start running your code. By triggering a run, in the end, you are going to see an output of terraform plan. If you want to create the resources inside this plan, you will need to confirm it. Otherwise, you can easily discard it and make other changes to your code.
After confirming the run, an apply job gets triggered and this is its output.
As you can see, the subscription id is directly masked inside the output, which also makes for easier demonstrations without having any fear that you will be leaking sensitive information.
The workflow can be easily extended each step of the way by either adding commands before and after phases, changing the runner image, integrating security tools, adding policies, and others.
Key Points
Infrastructure as Code has become a standard nowadays in the DevOps and Cloud Engineering world. It makes it easier to create, update and replicate your infrastructure while mitigating human errors. Terraform is very powerful, and the community around it is really big.
Choosing Terraform as your Infrastructure as Code tool and enhancing your workflow with Spacelift to achieve GitOps reduces your time to market, makes the experience more secure, and helps you easily integrate with third-party products.
Written by Flavius Dinu.