Lately at work, I have been using Terraform for our Infrastructure as Code (IaC) requirements for AWS workloads. As part of this learning journey, I also acquired Terraform Associate certification.
I wanted to explore Terraform for non-AWS use cases. At work, we are building a unified data platform for our data needs using Snowflake. So, I thought I will try to automate Snowflake resource deployments using Terraform.
Snowflake is defined as a cloud native, data platform offered as a SaaS. Lately, in the ever-evolving world of data platforms, Snowflake has emerged as a leading cloud-based data warehousing solution.
Manual provisioning of Snowflake resources like Databases, Schemas, Tables, Grants, Warehouses etc, is time consuming and prone to errors. This is where Infrastructure as Code (IaC) tools like Terraform and CI/CD pipelines using GitHub Actions makes life easier.
Terraform is an open-source, cloud agnostic IaC tool that allows us to define and provision cloud infrastructure using a very high level configuration language. Terraform implements this using plugins called providers.
GitHub Actions enables us to create efficient CI/CD pipelines based on code in GitHub repository.
This blog post demonstrates a step-by-step guide on how to deploy resources to Snowflake using Terraform and GitHub Actions, leveraging our repository cicd-with-terraform-for-snowflake. We will deploy a Database, a Schema, Grants and a Table onto two different environments (DEV and PROD) on the a Snowflake instance. We will use release based deployment pipelines to deploy to PROD environment.
Some pre-requisites and assumptions:
- An AWS Account with an S3 bucket and DynamoDB table already provisioned - we will be using these for Terraform remote backend and State locking.
- AWS credentials (Access Key and Secret Access Key) for the above AWS account configured as GitHub repository secrets.
- A Snowflake instance, a user with
ACCOUNTADMIN
permissions and related Key-pair authentication setup. Related private key configured as GitHub repository secret. - A Snowflake role TF_READER pre-created in the Snowflake instance. We will be deploying grants for this role using Terraform resources.
Setting up repository:
Clone the repository to your local machine:
git clone https://github.com/shekar-ym/cicd-with-terraform-for-snowflake.git
cd cicd-with-terraform-for-snowflake
Repository Structure:
-
.github/workflows/
: Contains the GitHub Actions workflow files that automate the deployment process. -
dev
andprod
folders contain the Terraform files fordevelopment
andproduction
environment respectively. -
module
folder contains the Terraform module definition which will be used for provisioning Snowflake resources like Database, Schema and Tables.
Terrform Modules:
Modules are containers for multiple resources that are used together. Modules are used to package and reuse resource configurations with Terraform.
In our case, we will be using modules to define Snowflake database, schema, grants, table and warehouse resources configuration. This module will be reused to create resource for development and production environments.
For example, below is the module resource configuration for database and schema:
resource "snowflake_database" "tf_database" {
name = var.database
comment = "Database for ${var.env_name}"
data_retention_time_in_days = var.time_travel_in_days
}
resource "snowflake_schema" "tf_schema" {
name = var.schema
database = snowflake_database.tf_database.name
comment = "Schema for ${var.env_name}"
}
Refer to the Github repository for other module resource configurations.
GitHub Actions Workflow:
The workflow will trigger a deployment to DEV
environment when you merge any code changes to main
branch using a pull request.
The workflow also includes a step for infrastructure code scan to scan Terraform code. This uses Checkov action against infrastructure-as-code, open source packages, container images, and CI/CD configurations to identify misconfigurations, vulnerabilities, and license compliance issues.
security-scan-terraform-code:
name: Security scan (terraform code)
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Run Checkov action
id: checkov
uses: bridgecrewio/checkov-action@master
with:
directory: .
soft_fail: true
download_external_modules: true
framework: terraform
There is a preview step, when you create a pull request - this preview step performs a terraform plan
to give you an overview what resources will be deployed or changed.
When you create a release/*
branch from main branch, this triggers a deployment to PROD
environment.
Deploying the resources to DEV:
Let us make some changes to the Terraform code, push the changes to GitHub repo and create a pull request (PR). Below is how the deployment pipeline looks:
And below are the steps performed as part of preview:
Let us merge our pull request to main
branch.
Here is the output of terraform apply
step:
Let us verify the resources on Snowflake. As you can see, the deployment pipeline created a database(TASTY_BYTES_DEV
) and schema(RAW_POS
) and a table (MENU
)
A new warehouse was also provisioned.
Deploying the resources to PROD:
Let us create a release branch from the main branch. This will trigger a deployment to PROD
environment.
As mentioned earlier, there will be an preview step which performs a terraform plan
to give you an overview what resources will be deployed or changed.
Since, I have configured environment protection rules, the pipeline stops for a manual approval, before triggering a deploy to PROD.
Approving this will trigger a deploy to PROD.
Here is the output of terraform apply step (for PROD):
Completed pipeline:
Let us verify the resources on Snowflake for PROD
environment. As you can see, the deployment pipeline created a database(TASTY_BYTES_PROD
) and schema(RAW_POS
) and a table (MENU
)
A new warehouse for PROD
was also provisioned.
Conclusion:
Automating the deployment of Snowflake resources using Terraform and GitHub Actions streamlines the process, reduces the potential for errors, and ensures that infrastructure is managed consistently. This setup not only saves time but also enhances the reliability and reproducibility of deployments. By following the steps outlined in this guide, you can leverage the power of IaC and CI/CD to manage your Snowflake infrastructure efficiently.
Thanks for reading. Please let me know your feedback in comments section.