AWS Identity and Access Management (IAM) helps you securely control access to AWS resources, and Amazon ECS is no exception. IAM controls what can access ECS resources in your AWS accounts. IAM also controls which AWS resources ECS and tasks running in ECS can access. This will be the focus of this lab.
Two types of IAM roles are used by ECS:
- ECS task execution role: This role is used by the ECS agent to pull container images and send logs to CloudWatch.
- ECS task role: This role is used by the containers to access other AWS services they depend on at runtime.
In this lab, you will learn about the ECS IAM roles first-hand and diagnose and troubleshoot related issues.
Learning objectives
Upon completion, you will be able to:
- Explain ECS task execution roles and task roles
- Diagnose and debug IAM issues in ECS
- Resolve IAM issues in ECS running
Prerequisites
Familiarity with the following topics is required to get the most out of this lab:
- AWS Identity and Access Management (IAM) fundamentals (roles and policies)
- Amazon Elastic Container Service (ECS) on AWS Fargate fundamentals
- Terraform fundamentals, with experience deploying on AWS
Environment Before
- ECS cluster that resembles the following diagram.
The ECS cluster contains three services that run a stock charting application:
- Frontend: React frontend that displays stock charts.
- API: RESTful API that provides stock data. The API is written in Java and uses the Spring Boot framework.
- Database: Persistence layer for (simulated) stock data. It is not usually advisable to run a database in a container, but for this lab, it is used to reduce the time needed to provision the lab compared to using RDS.
Each service has an auto-scaling group that maintains a desired task count of containers.
The frontend and API services sit in public subnets behind a public-facing application load balancer (ALB), while the database service resides in a private subnet behind an internal-facing ALB. To access the application, you can navigate to the following ALB public URL once the lab has been setup completely.
The application works fine, but a new feature is being developed that requires the API service to have access to S3. The initial code checks if a given S3 bucket exists and creates it if not. This will be the focus of your investigation into IAM in ECS.
These ECS resources are deployed using Terraform. This lab step will briefly highlight the resource configurations and Java application code related to IAM and will be referenced throughout this lab.
Reviewing the Sample Application Deployed on Amazon ECS With AWS Fargate
Instructions
I opened the lab’s development environment then launched the template.tf file in the editor
# Abbreviated template emphasizing IAM resources
resource "aws_ecs_cluster" "ecs_cluster" {
name = "lab-cluster"
}
resource "aws_ecs_task_definition" "api_task_definition" {
family = "lab-api"
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_task_role.arn
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
...
container_definitions = jsonencode([
{
...
environment = [
{ name = "BUCKET_NAME", value = "lab-experimental-${data.aws_caller_identity.current.id}" } # e.g. lab-experimental-123456789012
]
...
}
])
}
# IAM
data "aws_iam_policy_document" "assume_role_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
data "aws_iam_policy_document" "ecs_task_policy" {
statement {
effect = "Allow"
resources = [
"arn:aws:s3:::${local.bucket_name}",
]
actions = [
"s3:GetBucketAcl",
]
}
}
data "aws_iam_policy_document" "ecs_task_create_bucket_policy" {
statement {
effect = "Allow"
resources = [
"arn:aws:s3:::${local.bucket_name}",
]
actions = [
"s3:CreateBucket",
]
}
}
resource "aws_iam_role" "ecs_task_execution_role" {
name = "lab-ecs-task-execution-role"
assume_role_policy = data.aws_iam_policy_document.assume_role_policy.json
}
resource "aws_iam_role" "ecs_task_role" {
name = "lab-ecs-task-role"
assume_role_policy = data.aws_iam_policy_document.assume_role_policy.json
inline_policy {
name = "task-role-policy"
policy = data.aws_iam_policy_document.ecs_task_policy.json
}
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
resource "aws_iam_policy" "ecs_task_create_bucket_policy" {
name = "lab-ecs-s3-task-policy"
description = "Allows S3 actions for ECS tasks"
policy = data.aws_iam_policy_document.ecs_task_create_bucket_policy.json
}
...
To get an overview of the relevant Terraform resource configurations:
Starting with the aws_ecs_task_definition.api_task_definition
on line 7, the two arguments for configuring the roles are execution_role_arn
and task_role_arn
:
The task_role_arn
provides the task's containers access to other AWS services. The execution_role_arn
is used by the ECS container agent to pull container images from ECR and send logs to CloudWatch.
On line 19, an environment variable is set to configure the name of the S3 bucket that the API service will use:
The assume role policy, or trust policy, for both task execution and task roles, is configured beginning on line 28:
The trust policy allows the ECS service to assume the role. Notice that both types or roles are assumed by the ecs-tasks.amazonaws.com
service principal, so only one trust policy is needed.
Below that, from lines 39-61, are two data IAM policy document sources that store simple single-action policy statements for S3:
The ecs_task_policy
grants the s3:GetBucketAcl
action on the API's S3 bucket while the ecs_task_create_bucket_policy
grants s3:CreateBucket
. These two policies allow checking if a bucket exists and creating it if not.
The task execution role (ecs_task_execution_role
) and task role (ecs_task_role
) are configured on lines 63-76, respectively:
Both roles configure the same trust policy with the assume_role_policy
argument. The ecs_task_role
also has an inline_policy
argument referencing the ecs_task_policy
data source. This policy grants the API service initial access to check if S3 buckets exist.
On lines 78-81, the task execution role has an AWS-managed policy attached to it:
This policy grants the ECS container agent access to pull container images from ECR and send logs to CloudWatch. You can view the AWS-managed AmazonECSTaskExecutionRolePolicy here. AWS-managed policies are a convenient way to grant common permissions to a role, but you should use them with caution in production where least-privilege policies are preferred. For example, you may want to specify a specific ECR registry where container images are allowed to be pulled.
Lastly, a custom IAM policy is created using the ecs_task_create_bucket_policy
data source policy document for use later on in the lab:
Next we check the java file to view the relevant source code for accessing S3
package com.cloudacademy.stocks.utils;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.model.AmazonS3Exception;
import com.amazonaws.services.s3.model.Bucket;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
@Component
public class S3Initializer {
private final AmazonS3 amazonS3;
@Autowired
public S3Initializer(AmazonS3 amazonS3) {
this.amazonS3 = amazonS3;
}
public void init(String bucketName) {
while (true) {
try {
try {
if (!amazonS3.doesBucketExistV2(bucketName)) {
Bucket bucket = amazonS3.createBucket(bucketName);
System.out.println("Bucket created: " + bucketName);
break;
} else {
System.out.println("Bucket already exists: " + bucketName);
break;
}
} catch (AmazonS3Exception e) {
System.err.println("Error creating bucket: " + e.getMessage());
}
Thread.sleep(5000);
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
System.err.println("Thread interrupted");
}
}
}
}
- The application uses the aws-java-sdk-s3 library to interact with S3
- The
init
method checks if the S3 bucket exists (line 24) and creates it if not (line 25)- The process is repeated every 5 seconds until the bucket is created or found to exist. This ensures a steady stream of log error messages if there are any IAM permission issues.
- The
doesBucketExistV2
uses the getBucketAcl API action as seen in the aws-java-sdk source code on GitHub and also explained in the AWS SDK for Java API Reference.
In practice, it may be preferred to create the bucket using Terraform. However, this application code is a convenient way to demonstrate IAM concepts in ECS on Fargate.
Up next, you will configure AWS credentials in this IDE's terminal to provide access to the AWS CLI used in the next lab step.
So we configure the AWS CLI:
aws configure set aws_access_key_id {REDACTED} &&
aws configure set aws_secret_access_key {REDACTED} &&
aws configure set default.region us-west-2
Detecting the Task's IAM Issue
A solid understanding of the differences between the task execution role and task role is essential to understanding the IAM issue in ECS.
The task execution role grants permissions to the Fargate container agent. Typical examples of what permissions are provided by the task execution role include:
- Permission to pull container images from private Amazon Elastic Container Registry (ECR)
- Sending logs to Amazon CloudWatch
These two use cases are what the AWS-managed policy AmazonECSTaskExecutionRolePolicy
allow. Other use cases may require accessing AWS Secrets Manager, for example. In this case, you must create a custom task execution role with a custom policy attached to it. More details are provided in the official documentation.
The task role is also used by the task's containers to access other AWS services they depend on at runtime, such as Amazon Simple Storage Service (S3), Amazon Relational Database Service (RDS), and Amazon DynamoDB.
This lab step inspects the lab environment to identify an IAM issue.
Instructions
In the AWS Management Console, navigate to the lab's ECS cluster:
All the services should be Active. If not, try periodically refreshing the Services table every minute until they are.
Click api-Service to view the API service's details.
On the Logs tab, note that logs are displayed.
The fact that log messages are present indicates that the task
execution role allows the creation of CloudWatch log streams and
putting log events. In fact, insufficient permissions on the task
execution role often result in the task failing to start.
Click the Deployments and events tab and scroll down to the events table:
In case of task execution role issues, you can click on the task IDs presented in the events table to view error messages in the task overview panel that caused the task to fail.
Return to the API service's Logs tab and observe what log messages are displayed:
The log messages indicate that the application is unable to create the S3 bucket. The message is printed in the application code that catches the AmazonS3Exception
within the retry loop. When errors aren't so obvious, you can search for errors directly from the logs view or open the logs in CloudWatch Logs for more advanced search capabilities.
Because the createBucket
method is attempted, the doesBucketExistV2
method succeeded and returned false.
This confirms that the Task container was successfully authenticated to the S3 service using the task role. Recall the task role initially only has permission to check if the bucket exists.
Observe in the Task column that more than one task ID is displayed:
By default, all task logs are merged and displayed. You can click on one of the task IDs to view the logs for that specific task when needed.
When the logs don't clearly indicate an IAM issue, you can use CloudTrail to identify failed AWS API calls. Note that there can be several minutes delay between a failed API call and when its corresponding event appears in CloudTrail.
Navigate to the CloudTrail event history:
The application code's attempts to create the S3 bucket are visible in the event history. By default, you can't tell from the table view if an API failed. The errorCode field indicates if a failure occurred and its column can be added to the table.
Click the cog to the upper-right of the table:
It brings preferences
Toggle the Error Code column on and click Confirm:
Now the AccessDenied errors are visible in the last column of the table.
You may be wondering why the API calls for checking if the bucket exists are not included in the table. By default, read-only events are excluded, but you can view them by changing the Read-only lookup attribute to true.
In your IDE terminal, enter the following command to view the last event from CloudTrail:
aws cloudtrail lookup-events --max-results 1
In the above example, the last event was a CreateBucket API call. The CloudTrailEvent event field contains the errorCode field, but it cannot be filtered using the AWS CLI alone. You may have CloudTrail configured to send events to other services with more advanced filtering capabilities. If not, you could use jq
to filter the JSON output.
You learned how to identify an IAM issue in ECS. You learned how to view logs and events in the ECS console and how to use CloudTrail to identify failed API calls.
You learned how to identify an IAM issue in ECS. You learned how to view logs and events in the ECS console and how to use CloudTrail to identify failed API calls.
Resolving the Task's IAM Issue
To correct the IAM issue causing the application to fail to create the S3 bucket, you will update the task role in this lab step. Recall that an IAM least-privilege policy (lab-ecs-s3-task-policy) with permission to create the lab S3 bucket has already been created in the lab's terraform template.
In your IDE terminal, enter the following to attach the lab-ecs-s3-task-policy to the task role:
account_id=$(aws sts get-caller-identity --query Account --output text)
aws iam attach-role-policy --role-name lab-ecs-task-role --policy-arn arn:aws:iam::$account_id:policy/lab-ecs-s3-task-policy
Similarly, you could use the aws_iam_role_policy_attachment terraform resource to attach the policy to the task role. In practice, you may also consider merging the s3:CreateBucket permission into the existing role policy.
Return to the ECS API service's task role and periodically refresh the logs until you observe the following logs:
It can take a minute or two for the new permissions to propagate to the task's container. Once the permissions have propagated, the first task will successfully create the S3 bucket, and the second task will detect it exists.
You resolved the IAM issue by attaching a policy to the ECS task role.
Environment After
Conclusion
Properly configuring IAM for Amazon ECS on AWS Fargate is crucial for maintaining the security and compliance of your containerized applications.
By carefully defining task execution and task IAM roles, you can grant your containers the necessary permissions to access AWS resources while minimizing exposure.
This article has explored the fundamental concepts of IAM in the context of Fargate, including:
Understanding the difference between task execution and task IAM roles.
Creating IAM policies to grant specific permissions.
Best practices for managing IAM roles and policies.
By following these guidelines and tailoring them to your specific application requirements, you can establish a robust IAM strategy that protects your sensitive data and ensures the security of your containerized workloads on AWS Fargate.
Remember, the principle of least privilege should always guide your IAM decisions. Granting only the necessary permissions to your containers will reduce the potential attack surface and enhance overall security.