Network topology is one of the most critical points in the life of an organization on Google Cloud.
If you implement the wrong topology for your business, it will cost you a lot of money later for zero business-value.
Let's start by a simple use case. A customer wanted to keep it simple (or get a return on investment in a poc 😨) and created one Shared VPC for all workloads and environments. The Shared VPC is connected to an on-premise environment using Cloud VPN. The network architecture has been built like this:
One subnet per environment. 30 critical microservices are running in production and depend on sensitives data stored in a Cloud SQL database in addition to some stateful applications like Elasticsearch.
For security reasons, the customer now wants to isolate the production workload in a separate VPC. An external service provider was engaged to perform the migration. No downtime is accepted.
The service provider analyzed the current network architecture and proposed this migration plan:
- Make an audit to prioritise workload and write the workload dependencies architectures,
- Create a new GCP network project and one for production v2,
- Initialize data replication between Cloud SQL Instance v1 and v2,
- Initialize cross cluster replication between Elasticsearch v1 and v2,
- Deploy a microservice in prod v2 by pointing to database v1 and elasticsearch v1,
- Split the external traffic in Load Balancer v1 between microservice v1 and v2,
- Repeat steps 5 & 6 for each microservice,
- Deploy the stateful applications in prod v2,
- Once all microservices are in prod v2, do the final delivery:
- Switching DNS,
- Promoting Cloud SQL instance v2 as master database,
- Pointing all microservices to Cloud SQL instance v2 and Elasticsearch v2.
Depending on the dependencies between the microservices, such migration may take several months and be very expensive.
No value for the business, no value for the end user. The customer will probably postpone the migration.
I met a customer with a more complex network topology where he placed his development environment in a European subnet and the production environment in a London subnet. For the constraints of the GDPR, he wanted to migrate the workloads from London to a European subnet. As this represented no value to their business, the budget that was required could not be justified to start the migration. But he will still have to do it later, whatever the price.
So the best network topology that you can implement in Google Cloud should be:
- Scalable, secure and reliable,
- Isolated as possible from public internet,
- Designed to be connectable to a private network if you have or wish one in the future,
- Met the unique requirements of your enterprise workloads [1],
- Suited to the architecture patterns that you intend to apply,
In an hybrid cloud or multi-cloud architecture, the hub and spoke topology is the common network topology encouraged by cloud providers and network community.
What is the hub and spoke network topology ?
The spoke-hub distribution paradigm is a form of transport topology optimization in which traffic planners organize routes as a series of "spokes" that connect outlying points to a central "hub". Wikipedia [2]
In a multi-cloud or hybrid cloud architecture, a set of spoke VPC networks communicate with the external environment through a hub VPC network. The relevant routes are exported from the hub VPC network into the spoke VPC networks. [4]
In this post, we will implement the following architecture in Google Cloud. A Hub-and-spoke architecture with VPC peering and a segmentation based on environments:
- Each spoke represents a larger network segment.
- Spokes are isolated as VPC peering is non-transitive.
- Within each spoke, the connectivity between workloads is separated with VPC Firewall rules.
- The hub is a custom VPC and peered to the spokes, which are Shared VPCs.
- Making use of Shared VPC helps keeping the design scalable and simple.
- Spokes are connected to the hub with VPC peering to ensure low latency, and minimal management overhead.
- The Hub VPC is connected with on-premise through a static VPN connection. It could be replaced by a dynamic VPN connection or an Interconnect.
- The hub VPC is isolated from the public Internet with explicit VPC Firewall rules.
- Network services are centrally administered for connectivity between spokes and on-premise.
- To allow a path between spokes and on-premise, custom routes exchange is configured between the hub and each spoke.
The following resource hierarchy is used in this example:
We use Terraform to build the infrastructure and Gitlab CI to deploy it. Let's start by creating our hub.
Hub Network Project
Create a new project mycompany-network-hub
:
gcloud projects create mycompany-network-hub
gcloud compute networks delete default
The following files create:
- Custom VPC and a subnet.
- VPN Tunnel.
- Firewall rules to deny ingress and egress traffic.
repo-mycompany-network-hub/plan/project.tf
data "google_project" "hub" {
project_id = "mycompany-network-hub"
}
repo-mycompany-network-hub/plan/vpc.tf
resource "google_compute_network" "hub" {
name = "hub"
auto_create_subnetworks = false
project = data.google_project.hub.project_id
}
resource "google_compute_subnetwork" "hub-subnet" {
name = "hub-subnet"
ip_cidr_range = var.hub_subnet_ip_range
region = var.region
network = google_compute_network.hub.id
project = data.google_project.hub.project_id
}
Note: Classic VPN is deprecating certain functionality on October 31, 2021. For more information, see the Classic VPN partial deprecation page.repo-mycompany-network-hub/plan/vpn.tf
resource "google_compute_vpn_tunnel" "tunnel" {
name = "tunnel"
peer_ip = var.on_premise_peer_ip
shared_secret = data.google_secret_manager_secret_version.vpn-shared-secret.secret_data
project = data.google_project.hub.project_id
ike_version = 2
remote_traffic_selector = [var.on_premise_network_ip_range]
local_traffic_selector = [var.hub_subnet_ip_range]
target_vpn_gateway = google_compute_vpn_gateway.target_gateway.id
region = var.region
depends_on = [
google_compute_forwarding_rule.fr_esp,
google_compute_forwarding_rule.fr_udp500,
google_compute_forwarding_rule.fr_udp4500,
]
}
resource "google_compute_vpn_gateway" "target_gateway" {
name = "vpn"
network = google_compute_network.hub.id
project = data.google_project.hub.project_id
region = var.region
}
resource "google_compute_forwarding_rule" "fr_esp" {
name = "fr-esp"
ip_protocol = "ESP"
ip_address = data.google_compute_address.vpn-static-ip.address
target = google_compute_vpn_gateway.target_gateway.id
project = data.google_project.hub.project_id
region = var.region
}
resource "google_compute_forwarding_rule" "fr_udp500" {
name = "fr-udp500"
ip_protocol = "UDP"
port_range = "500"
ip_address = data.google_compute_address.vpn-static-ip.address
target = google_compute_vpn_gateway.target_gateway.id
project = data.google_project.hub.project_id
region = var.region
}
resource "google_compute_forwarding_rule" "fr_udp4500" {
name = "fr-udp4500"
ip_protocol = "UDP"
port_range = "4500"
ip_address = data.google_compute_address.vpn-static-ip.address
target = google_compute_vpn_gateway.target_gateway.id
project = data.google_project.hub.project_id
region = var.region
}
resource "google_compute_route" "route" {
name = "route"
network = google_compute_network.hub.name
project = data.google_project.hub.project_id
dest_range = var.on_premise_network_ip_range
priority = 1000
next_hop_vpn_tunnel = google_compute_vpn_tunnel.tunnel.id
}
data "google_secret_manager_secret_version" "vpn-shared-secret" {
project = data.google_project.hub.project_id
secret = "vpn-shared-secret"
}
data "google_compute_address" "vpn-static-ip" {
project = data.google_project.hub.project_id
name = "vpn-static-ip"
region = var.region
}
repo-mycompany-network-hub/plan/firewall.tf
resource "google_compute_firewall" "allow-ingress-traffic-from-vpn" {
name = "allow-ingress-traffic-to-vpn"
network = google_compute_network.hub.name
project = data.google_project.hub.project_id
allow {
protocol = "tcp"
}
source_ranges = [var.on_premise_network_ip_range]
priority = 1000
direction = "INGRESS"
}
resource "google_compute_firewall" "allow-egress-traffic-to-vpn" {
name = "allow-egress-traffic-to-vpn"
network = google_compute_network.hub.name
project = data.google_project.hub.project_id
allow {
protocol = "tcp"
}
destination_ranges = [var.on_premise_network_ip_range]
priority = 1000
direction = "EGRESS"
}
resource "google_compute_firewall" "deny-ingress-traffic-from-internet" {
name = "deny-all-ingress-traffic"
network = google_compute_network.hub.name
project = data.google_project.hub.project_id
deny {
protocol = "all"
}
source_ranges = ["0.0.0.0/0"]
priority = 2000
direction = "INGRESS"
}
resource "google_compute_firewall" "deny-egress-traffic-to-internet" {
name = "deny-all-egress-traffic"
network = google_compute_network.hub.name
project = data.google_project.hub.project_id
deny {
protocol = "all"
}
destination_ranges = ["0.0.0.0/0"]
priority = 2000
direction = "EGRESS"
}
repo-mycompany-network-hub/plan/backend.tf
terraform {
backend "gcs" {
}
}
repo-mycompany-network-hub/plan/provider.tf
terraform {
required_version = ">= 0.12"
required_providers {
google = "~> 3.0"
}
}
repo-mycompany-network-hub/plan/variables.tf
variable "hub_subnet_ip_range" {
type = string
}
variable "region" {
type = string
default = "europe-west1"
}
variable "on_premise_network_ip_range" {
type = string
}
variable "on_premise_peer_ip" {
type = string
}
repo-mycompany-network-hub/plan/terraform.tfvars
hub_subnet_ip_range = "<HUB_SUBNET_IP_RANGE>"
on_premise_peer_ip = "<ON_PREMISE_PEER_IP>"
on_premise_network_ip_range = "<ON_PREMISE_NETWORK_IP_RANGE>"
Spoke Network Project
Create a new project for each spoke:
gcloud projects create mycompany-network-spoke-nonprod
gcloud compute networks delete default
gcloud projects create mycompany-network-spoke-prod
gcloud compute networks delete default
The following files create:
- Custom VPC, a subnet and a peering with the hub network.
- Cloud NAT.
repo-mycompany-network-spokes/plan/project.tf
data "google_project" "spoke" {
project_id = "mycompany-network-spoke-${var.env}"
}
data "google_project" "hub" {
project_id = "mycompany-network-hub"
}
resource "google_compute_shared_vpc_host_project" "host" {
project = data.google_project.spoke.project_id
}
repo-mycompany-network-spokes/plan/vpc.tf
resource "google_compute_network" "spoke" {
name = "spoke"
auto_create_subnetworks = false
project = data.google_project.spoke.project_id
}
resource "google_compute_subnetwork" "spoke-subnet" {
name = "spoke-subnet"
ip_cidr_range = var.spoke_subnet_ip_range
region = var.region
network = google_compute_network.spoke.id
project = data.google_project.spoke.project_id
secondary_ip_range = [
{
range_name = "pods"
ip_cidr_range = var.spoke_subnet_pods_ip_range
},
{
range_name = "services"
ip_cidr_range = var.spoke_subnet_services_ip_range
}
]
}
resource "google_compute_network_peering" "spoke-to-hub" {
name = "spoke-to-hub"
network = google_compute_network.spoke.id
peer_network = data.google_compute_network.hub.self_link
export_custom_routes = true
import_custom_routes = true
}
# Could be moved to network hub tf
resource "google_compute_network_peering" "hub-to-spoke" {
name = "hub-to-spoke"
network = data.google_compute_network.hub.self_link
peer_network = google_compute_network.spoke.id
export_custom_routes = true
import_custom_routes = true
}
data "google_compute_network" "hub" {
name = "hub"
project = data.google_project.hub.project_id
}
repo-mycompany-network-spokes/plan/nat.tf
resource "google_compute_router" "router" {
name = "router"
region = google_compute_subnetwork.spoke-subnet.region
network = google_compute_network.spoke.id
project = data.google_project.spoke.project_id
bgp {
asn = 64514
}
}
resource "google_compute_router_nat" "nat" {
name = "nat"
router = google_compute_router.router.name
region = google_compute_router.router.region
project = data.google_project.spoke.project_id
nat_ip_allocate_option = "MANUAL_ONLY"
nat_ips = [data.google_compute_address.nat-static-ip1.self_link, data.google_compute_address.nat-static-ip2.self_link]
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
data "google_compute_address" "nat-static-ip1" {
project = data.google_project.spoke.project_id
name = "nat-static-ip1"
region = var.region
}
data "google_compute_address" "nat-static-ip2" {
project = data.google_project.spoke.project_id
name = "nat-static-ip2"
region = var.region
}
repo-mycompany-network-spokes/plan/backend.tf
terraform {
backend "gcs" {
}
}
repo-mycompany-network-spokes/plan/provider.tf
terraform {
required_version = ">= 0.12"
required_providers {
google = "~> 3.0"
}
}
repo-mycompany-network-spokes/plan/variables.tf
variable "region" {
type = string
default = "europe-west1"
}
variable "spoke_subnet_ip_range" {
type = string
}
variable "spoke_subnet_pods_ip_range" {
type = string
}
variable "spoke_subnet_services_ip_range" {
type = string
}
variable "env" {
type = string
}
repo-mycompany-network-spokes/envs/nonprod/terraform.tfvars
env = "<ENV>"
spoke_subnet_ip_range = "<SPOKE_SUBNET_IP_RANGE>"
spoke_subnet_pods_ip_range = "<SPOKE_SUBNET_PODS_IP_RANGE>"
spoke_subnet_services_ip_range = "<SPOKE_SUBNET_SERVICES_IP_RANGE>"
repo-mycompany-network-spokes/envs/prod/terraform.tfvars
env = "<ENV>"
spoke_subnet_ip_range = "<SPOKE_SUBNET_IP_RANGE>"
spoke_subnet_ip_pods_range = "<SPOKE_SUBNET_PODS_RANGE>"
spoke_subnet_ip_services_range = "<SPOKE_SUBNET_SERVICES_RANGE>"
Deployment
Before running our pipeline in Gitlab CI, we first need to create the following resources in the network hub project:
gcloud config set project mycompany-network-hub
gcloud compute addresses create vpn-static-ip --region europe-west1
gcloud services enable secretmanager.googleapis.com
gcloud beta secrets create vpn-shared-secret --locations europe-west1 --replication-policy user-managed
echo -n "<shared_key_here>" | gcloud beta secrets versions add vpn-shared-secret --data-file=-
Note: Static IP used to establish a static VPN connection should always be created
manually
. If you ever need to recreate (or have unintentionally destroyed) your VPN tunnel, the on-premise environment won't need to recreate the tunnel.
And for each spoke project:
gcloud config set project mycompany-network-spoke-<env>
gcloud compute addresses create nat-static-ip1 --region europe-west1
gcloud compute addresses create nat-static-ip2 --region europe-west1
Note: Static IP addresses used to create a NAT Gateway should always be created
manually
. If you ever need to recreate (or have unintentionally destroyed) the NAT Gateway, the tools and servers that whitelist those IP addresses won't need to update their source IP addresses.
We will also need a bucket for our terraform states.
## enable apis
gcloud config set project mycompany-secops
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable storage.googleapis.com
## create gcs bucket
export REGION_DEFAULT=europe-west1
export BUCKET_NAME=bucket-mycompany-terraform-backend
gsutil mb -c standard -l $REGION_DEFAULT gs://$BUCKET_NAME
gsutil versioning set on gs://$BUCKET_NAME
Note: I recommend customers to centralize the terraform bucket in a specific project.
Now we can create our pipelines. The Gitlab runner will need the following permissions:
-
roles/compute.networkAdmin
at network folder level. -
roles/compute.xpnAdmin
at spoke folder level. -
roles/storage.objectAdmin
onmycompany-secops
project.
Note: To assign permissions to a Gitlab runner, please check out my latest article on Securing Google Service Account from Gitlab CI.
Complete the terraform.tfvars
:
HUB_SUBNET_IP_RANGE=
ON_PREMISE_PEER_IP=
ON_PREMISE_NETWORK_IP_RANGE=
sed -i "s,<HUB_SUBNET_IP_RANGE>,${HUB_SUBNET_IP_RANGE},g;s,<ON_PREMISE_PEER_IP>,${ON_PREMISE_PEER_IP},g;s,<ON_
PREMISE_NETWORK_IP_RANGE>,${ON_PREMISE_NETWORK_IP_RANGE},g" plan/terraform.tfvars
repo-mycompany-network-hub/.gitlab-ci.yaml
stages:
- init
- deploy
# Install Terraform
.install:
before_script:
- apt-get update
- apt-get install -y zip unzip
- curl -sS "https://releases.hashicorp.com/terraform/0.14.7/terraform_0.14.7_linux_amd64.zip" > terraform.zip
- unzip terraform.zip -d /usr/bin
init terraform:
extends: .install
stage: init
image:
name: google/cloud-sdk
script:
- cd plan
- gcloud config set project mycompany-network-hub
- terraform init -backend-config="bucket=bucket-mycompany-terraform-backend" -backend-config="prefix=network/hub/terraform/state"
artifacts:
paths:
- plan/.terraform
only:
- master
tags:
- k8s-network-runner
deploy terraform:
extends: .install
stage: deploy
image:
name: google/cloud-sdk
script:
- cd plan
- gcloud config set project mycompany-network-hub
- terraform apply -auto-approve
only:
- master
tags:
- k8s-network-runner
Complete the terraform.tfvars
:
ENV=
SPOKE_SUBNET_IP_RANGE=
SPOKE_SUBNET_PODS_IP_RANGE=
SPOKE_SUBNET_SERVICES_IP_RANGE=
sed -i "s,<ENV>,${ENV},g;s,<SPOKE_SUBNET_IP_RANGE>,${SPOKE_SUBNET_IP_RANGE},g;s,<SPOKE_SUBNET_PODS_IP_RANGE>,${SPOKE_SUBNET_PODS_IP_RANGE},g;s,<SPOKE_SUBNET_SERVICES_IP_RANGE>,${SPOKE_SUBNET_SERVICES_IP_RANGE},g" envs/$ENV/terraform.tfvars
repo-mycompany-network-spokes/.gitlab-ci.yaml
stages:
- init
- deploy
init terraform:
stage: init
image:
name: google/cloud-sdk
script:
- cd envs/$ENV
- gcloud config set project mycompany-network-spoke-$ENV
- terraform init -backend-config="bucket=bucket-mycompany-terraform-backend" -backend-config="prefix=network/spoke/$ENV/terraform/state" ../../plan/
artifacts:
paths:
- envs/$ENV/.terraform
only:
- master
tags:
- k8s-network-runner
deploy terraform:
stage: deploy
image:
name: google/cloud-sdk
script:
- cd envs/$ENV
- gcloud config set project mycompany-network-spoke-$ENV
- terraform apply -auto-approve ../../plan/
only:
- master
tags:
- k8s-network-runner
Once the connection is established between the hub and the spokes, you can attach your service projects to the host projects:
gcloud config set project mycompany-business-$ENV
gcloud beta compute shared-vpc associated-projects add mycompany-business-$ENV --host-project mycompany-network-spoke-$ENV
PROJECT_NUMBER=$(gcloud projects list --filter="$(gcloud config get-value project)" --format="value(PROJECT_NUMBER)")
gcloud projects add-iam-policy-binding mycompany-network-spoke-$ENV --member "serviceAccount:$PROJECT_NUMBER@cloudservices.gserviceaccount.com" --role "roles/compute.networkUser"
gcloud projects add-iam-policy-binding mycompany-network-spoke-$ENV --member "serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com" --role "roles/compute.networkUser"
Each compute resource running in service projects will automatically have access to on-premise workloads and vice versa.
Go further
To add more security on connectivity, we can enforce some organization policy constraints:
compute.restrictVpnPeerIPs
compute.restrictDedicatedInterconnectUsage
compute.restrictPartnerInterconnectUsage
compute.restrictVpcPeering
compute.restrictCloudNATUsage
compute.restrictXpnProjectLienRemoval
compute.restrictSharedVpcHostProjects
compute.restrictSharedVpcSubnetworks
compute.skipDefaultNetworkCreation
compute.vmExternalIpAccess
Filter on-premise network traffic using a hierarchical firewall at the folder level of the spoke environment.
If your business project has a private GKE cluster, you will not be able to reach out the on-premise network from pods. You will need to force masquerading for all the traffic originating from the pods [5].
If you need to resolve DNS between on-premise and your business projects, you can implement a DNS topology similar to the hub-and-spoke model.
It could be a subject of a new post 😄.
Conclusion
In this post, we implemented a hub-and-spoke network topology in Google Cloud using Terraform and Gitlab CI.
This topology ensures a scalable network architecture for most use cases.
If you have any questions or feedback, please feel free to leave a comment.
Otherwise, do not hesitate to share with peers 😊
Thanks for reading!
Documentation
[1] https://cloud.google.com/solutions/hybrid-and-multi-cloud-network-topologies
[2] https://en.wikipedia.org/wiki/Spoke%E2%80%93hub_distribution_paradigm
[3] https://en.wikipedia.org/wiki/Star_network
[4] https://cloud.google.com/solutions/deploying-nat-gateways-in-a-hub-and-spoke-architecture-using-vpc-network-peering-and-routing
[5] https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent