It is very easy today to establish a connection between a container in Kubernetes and a relational database server, just create a SQL user and open a TCP connection. In cloud computing, in the case of Amazon Web Services, the equivalent is connecting a container in an Amazon EKS cluster to an Amazon RDS instance.
Important points should be taken into account in setting up this connectivity.
Which network topology to choose? How to authenticate and authorize the connection to the RDS instance? Can I publicly expose a private RDS instance?
Which architecture could be the most efficient, maintainable and scalable?
Scenarios
Cloud SQL supports the following scenarios for accessing a DB instance in a VPC:
- A Compute Engine instance in the same VPC
- An Compute Engine in a different VPC
- A client application through the internet
- A private network
The scenarios that concern us are the first two:
- GKE and Cloud SQL in the same VPC.
- GKE and Cloud SQL in different VPCs.
In the first one, there is a direct communication between Kubernetes workloads and Cloud SQL instances.
In the second scenario, peering connection is needed between both. But it will work only if Cloud SQL is public.
Let's discover the possible architectures that could be used to implement each scenario.
Direct communication
In this architecture, our Amazon RDS instance is isolated on its own subnet and accessible only on a private IP address range to only Amazon EKS that requires access to it. Pods have direct access to Amazon RDS using VPC DNS resolution.
VPC Peering
In this architecture, our Amazon RDS instance is isolated on its own VPC. The two VPCs are paired. Pods have access to Amazon RDS using DNS name thanks to DNS propagation.
Private Links
In this architecture, our Amazon RDS instance is also isolated on its own VPC. The two VPCs are connected using private links. Amazon RDS can be accessed through a Network Load Balancer by Private IP Address. A Lambda is responsible for target registration/deregistration [2].
Each architecture has its own advantages and disadvantages but all apply network isolation best practices for securing sensitive data in Amazon RDS [3].
Let's explore scenario 1.
Scenario 1 in detail
In the scenario 1 architecture, the network isolation is achieved using Network ACLs and Security Groups. We can go more deeply by combining pod security groups with IAM roles for service accounts to provide a pod level defense in depth security strategy at both the networking and authentication layers.
- We associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account [5].
- Security groups for pods integrate Amazon EC2 security groups with Kubernetes pods. We use security groups to define rules that allow inbound and outbound network traffic to and from pods [6].
We could implement the same security pattern with the scenario of the VPC Peering. But not with the scenario of private links.
Now that we have a clear idea of the concepts, let's implement this architecture.
Prerequisites
- Installing and configuring AWS CLI
- Terraform
- Kubectl
- Kustomize
- Postgresql client
Architecture
The overall architecture that we will implement during this series of articles is as follows:
In addition to the previous version, we will deploy:
- NAT Gateways on EKS public subnets
- External Network Load Balancer to expose the RDS instance only to specific IP address ranges. A Lambda will be used to populate NLB Target Group with RDS private IP.
Objectives
During this section of the workshop:
With Terraform
- We will create a VPC with eight subnets
- 2 public and private subnets for Amazon EKS.
- 2 public and private subnets for Amazon RDS.
- An Internet Gateway attached to the VPC.
- NAT gateways attached to EKS public subnets, but not on RDS public subnets as Amazon RDS doesn't need to access the public internet.
- Network ACL for each couple of subnets.
- EKS Cluster and EKS node groups.
- A multi-AZ RDS PostgreSQL Instance.
- External Network Load Balancer and the Lambda described previously.
With Kubectl
- We will create an annotated Kubernetes service account with an IAM role that has the necessary permission to connect to the database.
- Security Group Policy to assign Amazon EC2 Security Groups to a pod.
- Metabase application. A Kubernetes deployment to connect with our PostgeSQL database.
The series is divided into five parts:
- Configuring an Isolated Network in AWS
- Creating an Amazon EKS Cluster with Managed Node Group using Terraform
- Securing Sensitive Data in Amazon RDS
- Combining Pod Security Groups with AWS IAM Roles for Service Accounts
Conclusion
In this first part, we discussed possible scenarios for securing communication between Amazon EKS workloads and Amazon RDS databases. In the next section, we'll implement our network stack using Terraform.
Documentation
[1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html
[2] https://aws.amazon.com/fr/blogs/networking-and-content-delivery/using-static-ip-addresses-for-application-load-balancers/
[3] https://aws.amazon.com/blogs/database/best-practices-for-securing-sensitive-data-in-aws-data-stores/
[4] https://aws.amazon.com/blogs/database/applying-best-practices-for-securing-sensitive-data-in-amazon-rds/
[5] https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/
[6]https://aws.amazon.com/blogs/containers/introducing-security-groups-for-pods/