Introduction:
Everyone, In every organization, security and compliance guardrails are measured in order to maintain the things are aligned with client expectations and agreement. There are many types of guardrails or compliance parameters out of which golden image creation is one of them. Before going into deep dive, let's under stand what is Golden Image.
Golden Image is basically an image that has all required or supporting packages to be installed like agent packages, software or utilities packages, vulnerability agent package etc. there can be other packages installed which are approved by client. So when you're going to build a golden image for the first time, you just have to make sure that all required tools are installed and running fine in that server(windows/linux) to support the environment. After all this needs to be aligned with approved SOE parameters document. Along with making sure all packages are installed, another thing which is OS needs to be updated with latest patches for current month. Once these all are done, then take a snapshot of that instance and consider as base image which is known as Golden Image. This image would be used for further server build activity in future.
Diagram:
Prerequisites:
GitLab
Terraform
Ansible(optional)
AWS Cloud Platform
Guidelines:
In this project, I have planned to build golden image for the first time as I didn't have any image earlier, so it's kind of we are starting from scratch. So, let me tell you guys first, that below are the planned action items to be done for this project ->
- Build AWS EC2 instance using Terraform.
- Provision EC2 instance using Ansible.
- Created CICD pipeline to build sequence of activities.
- Once entire provisioning is completed, then take an AMI of that instance.
- Lastly, terminate the instance.
Note: As this is done for the first time, so ansible is required because there is no OS hardening parameters implemented. After instance provisioning with latest patches and implementing all security standards, once image is created, then for next month activity, Ansible will not be required because OS hardening parameters would have baked up in last month.
Build an Instance using Terraform
I have taken a sample base image (not last month golden image) as a reference, fetched this image using terraform and created a new EC2 instance.
var.tf
variable "instance_type" {
description = "ec2 instance type"
type = string
default = "t2.micro"
}
data.tf:
## fetch AMI ID ##
data "aws_ami" "ami_id" {
most_recent = true
filter {
name = "tag:Name"
values = ["Golden-Image_2024-06-13"]
}
}
## Fetch SG and Keypair ##
data "aws_key_pair" "keypair" {
key_name = "keypair3705"
include_public_key = true
}
data "aws_security_group" "sg" {
filter {
name = "tag:Name"
values = ["management-sg"]
}
}
## Fetch IAM role ##
data "aws_iam_role" "instance_role" {
name = "CustomEC2AdminAccess"
}
## Fetch networking details ##
data "aws_vpc" "vpc" {
filter {
name = "tag:Name"
values = ["custom-vpc"]
}
}
data "aws_subnet" "subnet" {
filter {
name = "tag:Name"
values = ["management-subnet"]
}
}
instance.tf
resource "aws_iam_instance_profile" "test_profile" {
name = "InstanceProfile"
role = data.aws_iam_role.instance_role.name
}
resource "aws_instance" "ec2" {
ami = data.aws_ami.ami_id.id
instance_type = var.instance_type
associate_public_ip_address = true
availability_zone = "us-east-1a"
key_name = data.aws_key_pair.keypair.key_name
security_groups = [data.aws_security_group.sg.id, ]
iam_instance_profile = aws_iam_instance_profile.test_profile.name
subnet_id = data.aws_subnet.subnet.id
user_data = file("userdata.sh")
root_block_device {
volume_size = 15
volume_type = "gp2"
}
tags = {
"Name" = "GoldenImageVM"
}
}
output.tf
output "ami_id" {
value = {
id = data.aws_ami.ami_id.image_id
arn = data.aws_ami.ami_id.arn
image_loc = data.aws_ami.ami_id.image_location
state = data.aws_ami.ami_id.state
creation_date = data.aws_ami.ami_id.creation_date
image_type = data.aws_ami.ami_id.image_type
platform = data.aws_ami.ami_id.platform
owner = data.aws_ami.ami_id.owner_id
root_device_name = data.aws_ami.ami_id.root_device_name
root_device_type = data.aws_ami.ami_id.root_device_type
}
}
output "ec2_details" {
value = {
arn = aws_instance.ec2.arn
id = aws_instance.ec2.id
private_dns = aws_instance.ec2.private_dns
private_ip = aws_instance.ec2.private_ip
public_dns = aws_instance.ec2.public_dns
public_ip = aws_instance.ec2.public_ip
}
}
output "key_id" {
value = {
id = data.aws_key_pair.keypair.id
fingerprint = data.aws_key_pair.keypair.fingerprint
}
}
output "sg_id" {
value = data.aws_security_group.sg.id
}
output "role_arn" {
value = {
arn = data.aws_iam_role.instance_role.arn
id = data.aws_iam_role.instance_role.id
}
}
userdata.sh
#!/bin/bash
sudo yum install jq -y
##Fetching gitlab password from parameter store
GITLAB_PWD=`aws ssm get-parameter --name "gitlab-runner_password" --region 'us-east-1' | jq .Parameter.Value | xargs`
##Set the password for ec2-user
PASSWORD_HASH=$(openssl passwd -1 $GITLAB_PWD)
sudo usermod --password "$PASSWORD_HASH" ec2-user
## Create gitlab-runner user and set password
USER='gitlab-runner'
sudo useradd -m -u 1001 -p $(openssl passwd -1 $GITLAB_PWD) $USER
##Copy the Gitlab SSH Key to gitlab-runner server
sudo mkdir /home/$USER/.ssh
sudo chmod 700 /home/$USER/.ssh
Ansible_SSH_Key=`aws ssm get-parameter --name "Ansible-SSH-Key" --region 'us-east-1' | jq .Parameter.Value | xargs`
sudo echo $Ansible_SSH_Key > /home/$USER/.ssh/authorized_keys
sudo chown -R $USER:$USER /home/$USER/.ssh/
sudo chmod 600 /home/$USER/.ssh/authorized_keys
sudo echo "StrictHostKeyChecking no" >> /home/$USER/.ssh/config
sudo echo "$USER ALL=(ALL) NOPASSWD : ALL" > /etc/sudoers.d/00-$USER
sudo sed -i 's/^#PermitRootLogin.*/PermitRootLogin yes/; s/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
sleep 40
Here, we have used a shell script to get prerequisites installed for Ansible like user creation and providing sudo access etc.
Provision EC2 instance using Ansible:
Note: Before triggering ansible job in GitLab, please make sure you login to the server you built from gitlab runner as gitlab-runner is going to login to new server for ansible provisioning and that time it will get an error if we don't perform below one ->
main.yml
---
-
name: Set hostname
hosts: server
become: true
gather_facts: false
vars_files:
- ../vars/variable.yml
roles:
- ../roles/hostnamectl
-
name: Configure other services
hosts: server
become: true
roles:
- ../roles/ssh
- ../roles/login_banner
- ../roles/services
- ../roles/timezone
- ../roles/fs_integrity
- ../roles/firewalld
- ../roles/log_management
- ../roles/rsyslog
- ../roles/cron
- ../roles/journald
-
name: Start Prepatch
hosts: server
become: true
roles:
- ../roles/prepatch
-
name: Start Patching
hosts: server
become: true
roles:
- ../roles/patch
-
name: Start Postpatch
hosts: server
become: true
roles:
- ../roles/postpatch
-
name: Reboot the server
hosts: server
become: true
tasks:
- reboot:
msg: "Rebooting machine in 5 seconds"
Prepare GitLab CI/CD Pipeline:
There are 4 stages created for entire deployment activity. Initially it will start with validation to make sure if all required services are running fine as expected.
If yes, then it will proceed for resource(EC2) build using Terraform. Here, I have used Terraform Cloud to make things more reliable and store state file in managed memory location provided by Hashicorp. But terraform cli can be used without any issues.
After successful resource build, provisioning needs to be performed to implement basic security standards and complete OS hardening process using Ansible CLI.
At last, once provisioning with patching is completed, pipeline job will take an AMI using AWS CLI commands.
Below are the required stages for this pipeline ->
- Validation
- InstanceBuild
- InstancePatching
- TakeAMI
.gitlab-ci.yml
default:
tags:
- anirban
stages:
- Validation
- InstanceBuild
- InstancePatching
- TakeAMI
- Terminate
job1:
stage: Validation
script:
- sudo chmod +x check_version.sh
- source check_version.sh
except:
changes:
- README.md
artifacts:
when: on_success
paths:
- Validation_artifacts
job2:
stage: InstanceBuild
script:
- sudo chmod +x BuildScript/1_Env.sh
- source BuildScript/1_Env.sh
- python3 BuildScript/2_CreateTFCWorkspace.py -vvv
except:
changes:
- README.md
artifacts:
paths:
- Validation_artifacts
- content.tar.gz
job3:
stage: InstancePatching
script:
- INSTANCE_PRIVATEIP=aws ec2 describe-instances --filters "Name=tag:Name, Values=GoldenImageVM" --query Reservations[0].Instances[0].PrivateIpAddress | xargs
- echo -e "[server]\n$INSTANCE_PRIVATEIP" > ./Ansible/inventory
- ansible-playbook ./Ansible/playbook/main.yml -i ./Ansible/inventory
- sudo chmod +x BuildScript/7_Cleanup.sh
when: manual
except:
changes:
- README.md
artifacts:
when: on_success
paths:
- Validation_artifacts
- ./Ansible/inventory
job4:
stage: TakeAMI
script:
- echo '------------Fetching Instance ID------------'
- INSTANCE_ID=aws ec2 describe-instances --filters "Name=tag:Name, Values=GoldenImageVM" --query Reservations[0].Instances[0].InstanceId | xargs
- echo '----------Taking an Image of Instance-----------'
- aws ec2 create-image --instance-id $INSTANCE_ID --name "GoldenImage" --description "Golden Image created on $(date -u +"%Y-%m-%dT%H:%M:%SZ")" --no-reboot --tag-specifications "ResourceType=image, Tags=[{Key=Name,Value=GoldenImage}]" "ResourceType=snapshot,Tags=[{Key=Name,Value=DiskSnaps}]"
when: manual
except:
changes:
- README.md
job5:
stage: Terminate
script:
- echo '------------Fetching Instance ID------------'
- INSTANCE_ID=aws ec2 describe-instances --filters "Name=tag:Name, Values=GoldenImageVM" --query Reservations[0].Instances[0].InstanceId | xargs
- echo '--------------------Terminating the Instance--------------------'
- aws ec2 terminate-instances --instance-ids $INSTANCE_ID
when: manual
except:
changes:
- README.md
Validation:
As per below images, we can see instances has been launched and provisioned successfully, post that AMI has been taken.
Conclusion:
So, after all we are at the end of this blog, I hope we all get an idea or approach how pipeline can be set up to build image without any manual intervention. However, in the pipeline I have referred Continuous Delivery approach, hence few stages are set to be trigged manually. There is one thing to highlight mandatorily which is "Do not set Ansible stage(job3) in gitlab as automatic. Use when: manual key to set this stage manual. As I mentioned on the top, ansible stage requires gitlab runner to login to newly build server which I could have mentioned as a command in the pipeline, but I didn't, thought of lets verify things by entering into the server from gitlab runner.
Hopefully you have enjoyed this blog, please go through this one and do the hands-on for sure๐๐. Please let me know how did you feel, what went well and how and where I could have done little better. All responses are welcome๐๐.
For upcoming updates, please stay tuned and get in touch. In the meantime, let's dive into below GitHub repository -> ๐๐
Thanks Much!!
Anirban Das.