That's it! We are going to migrate our on-premise applications to Google Cloud!
We start by deploying a first Web application to please the business team, we set up auto scaling and backups to satisfy the ops team, we encrypt the storage and the database with a key and a firewall to reassure the security team, all at a lower price.
Everything has been thought of and the migration of the first application is a real success and the business accepts to migrate other applications ... But wait ... no long-term strategy has been established to facilitate the migration of new applications 😱
Moving to the Public Cloud represents a major change within a company. A phase of adoption and acculturation is necessary for a successful transition.
A lack of a long-term vision can be costly during migrations to the Cloud and in particular to Google Cloud.
- Hierarchical organization of projects
- Network topology
- Centralization of security and monitoring
- The DevOps platform
- A plan for every migration strategy
Must be defined in advance, at the onboarding phases!
Did you also find it difficult to go further in your migration? Or are you thinking of migrating to Google Cloud?
Let's discover in this part 1 how to plan a migration in Google Cloud through feedback rich in lessons.
What is a hybrid Cloud?
According to Wikipedia:
"Hybrid cloud is a composition of a public cloud and a private environment, such as a private cloud or on-premises resources"
Eighty-seven percent of enterprises are taking a hybrid approach during their Cloud Migration [1]
Hybrid setups might be:
- temporary
- maintained only for a limited time to facilitate a migration
- the future state of most organizations
But, some companies choosing a Cloud Hybrid approach migrate without following any cloud adoption framework or even training!
Let's see the common traps
Traps to avoid
Poc to production
Some companies starting by developing a proof of concept put that POC into production.
A POC is a disposable, it should be dropped after the concept has been proved.
So if during the POC you created a GCP organization, used a CI/CD tool, deployed the resources using an infrastructure as code, it's not a POC but an MVP (Minimum Valuable Product) which it's different !
I saw many companies deployed their POC to production and asked to do a complete refactoring of their network topology, reorganizing the GCP projects, implementing CI/CD and infrastructure as code and it finally costs more than they thought to save.
On Premise mindset
Some companies moving to the cloud follow their on-premise practices in the Cloud.
Only the "business" workload should be moved to the cloud and it's called "lift & shift". All existing applications for security, networking, monitoring, etc. should be replaced either by the equivalent in the cloud provider or the cloud version of those applications.
Also for this case I saw that many companies wanted to move their IT software to the cloud which is not able to interact with cloud services.
Avoiding vendor lock-in at extreme level
Some companies moving to the cloud do not use managed services when it's possible.
Avoiding vendor lock-in has a sense when you are looking for portability of your business application and you don't want to depend on cloud SDKs. But all non-business applications for databases and storage should be hosted in managed services.
There are many other 'don't do' that can be listed in a dedicated post.
So before any move to the cloud, a strong sponsorship is needed to adopt a cloud mindset in the organisation.
Choose a scalable strategy
When a customer asks to build a hybrid cloud, I always recommend the deciders to think about a long-term vision with the selected cloud provider.
- Why is the current approach and computing environment insufficient?
- What are the primary metrics that you want to optimize by using the public cloud?
- How long do you plan to use a hybrid architecture? Do you consider this setup permanent, or interim for the length of a full cloud migration? [2]
When the vision is defined, the strategy to adopt is clearer:
- Identifying candidate workloads
- Identifying applicable patterns
- Identifying candidate topologies
- Prioritizing workloads
- Select initial workload to put in the public cloud
- Setting the Google Cloud organization, projects, and policies
- Implementing the network topology
- Setting the DevOps platform
- Start workloads migration
Organization setup
When you have created the Google Cloud organization for your company via GSuite or Cloud Identity, the first step to do is defining the resource hierarchy, google groups and policies.
Resource hierarchy
Defining a scalable resource hierarchy is very important.
If you start by doing this:
Or even follow this structure:
In the future, when you want to re-organize the resource hierarchy you will be in big trouble:
What I recommend to customers is to separate the business workloads from the operational workloads.
This hierarchy helps:
- to apply granular permissions for the allowed team
- to facilitate infrastructure as code automation
- to implement env-level policy management
- to centralize IT operations
If your organization depends on Kubernetes, you can have a business shared folder with shared Google Kubernetes Engine clusters. The clusters could be managed in the same way as Google Cloud hierarchy structure.
You can also use Namespace inheritance on GKE
Projects
For the same reasons as above, I also recommend one project per application for each environment:
Organization policies
If you have some business restrictions like data location (e.g: RGPD), the best idea is to enable global organization policies constraints during the organization setup.
Depending on your business needs, some other policies constraints can be useful, so do not hesitate to look at each of them. [3]
Separation of duties
A best practices in Google Cloud is creating google groups.
Collecting users with the same responsibilities into groups and assigning IAM roles to the groups. [4]
For example, network team will have network admin access in network folder.
Network
Many network topologies types exist and help to ensure communications between network node
As we have a connection to establish with on-premise data centers, we need to ensure a private and a secure connectivity with Google Cloud. A common topology used to achieve this is to implement a hub-and-spoke topology.
1 - Create a network hub project and establish a private connection using Cloud VPN or Cloud Interconnect
To isolate that connectivity from internet, you can add explicit ingress and egress rules in the VPC Firewall.
Another protection that we can add is enforcing some organization policies at network hub folder level:
compute.restrictVpnPeerIPs
compute.restrictDedicatedInterconnectUsage
compute.restrictPartnerInterconnectUsage
2 - Create a peering connection between each environment (spoke) and the network hub
As the peering is not transitive, the environments will not be able to communicate between each of them.
A shared VPC can be a good option if you have a unique network team. We can peer up to 25 Spokes. For more, you can use Cloud VPN connections (up to 100).
If your on-premise does not need to communicate with Google Cloud resources, you can keep a one-way custom routes exchange Hub to Spokes.
If your on-premise environments are in the same network, you need to split the network traffic at the Spoke level using Firewall rules. You can also use hierarchical firewall rules at folder level [5].
Another protection that we can add is enforcing some organization policies at network folder level:
compute.restrictVpcPeering
compute.restrictCloudNATUsage
3 - Attach the business projects to the spokes
A protection that we can add is enforcing some organization policies at spoke folders level :
compute.restrictXpnProjectLienRemoval
compute.restrictSharedVpcHostProjects
compute.restrictSharedVpcSubnetworks
compute.skipDefaultNetworkCreation
compute.vmExternalIpAccess
4 - Resolve DNS
If you have a DNS resolver in your on-premise, you can implement a DNS hub-and-spoke model using DNS peering.
Note - If your business project has a private GKE cluster, you will not be able to reach out the on-premise network from pods. You will need to force masquerading for all the traffic originating from the pods [8].
I wrote a complete tutorial on Implementing step by step the hub and spoke network topology in Google Cloud
You can also deploy virtual machines as NAT gateways and route the traffic through a centralized pool of appliances. The relevant routes are exported from the hub VPC network into the spoke VPC networks. The NAT gateways are configured with equal cost multi-path (ECMP) routing and autohealing enabled for a more resilient and high-bandwidth deployment. [13]
Cloud NAT isn't supported in this use case because the NAT configuration isn't imported into a peered network.
SecOps
There are three very sensitives resources in GCP that we need to protect whatever the price.
- Service accounts
- Cryptographic keys
- Secrets
Compute Engine customer images could also be critical for some organizations.
Service accounts
There are many important best practices to apply [6][15]:
- Delete default compute engine service account and use dedicated custom service-accounts
- Create single-purpose service accounts
- Identify and disable unused service accounts
- Use service accounts to apply firewalls
- For easier visibility and auditing, central create service accounts in a dedicated project
- For critical projects, some enterprises create service accounts in a dedicated organization
- Don't use service accounts during development
- Don’t embed service account keys in code
- Use service account keys if there's no viable alternative
A service account management that I recommend to customers is creating 2 service account types:
- service account
deployer
used by CI/CD tool to deploy resources - service account
user
used by application to access resources
The following diagrams illustrate that usage.
1 - Centralize the service accounts in a security project (except for service accounts used by Compute Engine resources)
You can enforce organization policy iam.disableServiceAccountCreation
(Except for security project)
2 - Bind the service accounts deployers to the DevOps tool workers (Example with GitLab)
If you use Gitlab runners in GKE with workload identity enabled, you can enforce organization policy iam.disableServiceAccountKeyCreation
3 - Bind the service accounts users to the business application (Example with Kubernetes workloads)
I wrote a complete tutorial on Securing access to Google Service Account Deployers from Gitlab CI
Cryptographic keys
If you need to use CMEK (customer-managed encryption keys), there are some important best practices to apply [7][9]:
- Hosting Cloud KMS keys in a separate project,
- For critical projects, some enterprises run Cloud KMS in a separate organization,
- Least privilege and separation of duties.
Secrets
In any IT project, you will need two types of credentials:
- Secrets for running your business applications like DB credentials.
- Secrets to access shared applications like Vault, ArgoCD, Git repositories, etc.
In Google Cloud, you can host shared secrets in the SecOps project and keep secrets used by business applications in the same GCP project.
There are some important best practices to manage secrets with Google Secret Manager
[14]:
- Choose the automatic replication policy when creating secrets unless your workload has specific location requirements.
- Reference secrets by their version number rather than using the latest alias.
- Disable secret versions before destroying them or deleting secrets.
Custom Images
If your business is based on Compute Engine Instances, there are some important best practices to manage custom images [10]:
- Testing the latest referenced image from the image family before using it in your production environment, [11]
- Creating custom images in a separate project, [12]
- Least privilege and separation of duties.
Conclusion
In this part we saw how it easy to build a scalable GCP organization, implementing a hub and spoke topology and centralizing the Google Service Accounts, KMS keys and Compute Engine Custom Images.
What's next?
In the second part we will see how we can deploy a DevOps platform using Gitlab and following the GitOps practices. We will also take an example of a migration of Docker applications in Kubernetes and finishing with cost saving tips.
Documentation:
[1] https://www.flexera.com/blog/industry-trends/trend-of-cloud-computing-2020/
[2] https://cloud.google.com/solutions/hybrid-and-multi-cloud-patterns-and-practices
[3] https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints
[4] https://cloud.google.com/docs/enterprise/best-practices-for-enterprise-organizations
[5] https://cloud.google.com/vpc/docs/firewall-policies
[6] https://cloud.google.com/iam/docs/best-practices-for-using-and-managing-service-accounts
[7] https://cloud.google.com/kms/docs/separation-of-duties
[8] https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent
[9] https://cloud.google.com/kms/docs/reference/permissions-and-roles#access-control-guidelines
[10] https://cloud.google.com/compute/docs/images/image-management-best-practices
[11] https://cloud.google.com/compute/docs/images/image-families-best-practices
[12] https://cloud.google.com/compute/docs/images/managing-access-custom-images
[13] https://cloud.google.com/solutions/deploying-nat-gateways-in-a-hub-and-spoke-architecture-using-vpc-network-peering-and-routing
[14] https://cloud.google.com/secret-manager/docs/best-practices
[15] https://cloud.google.com/iam/docs/best-practices-for-securing-service-accounts
Reviewers
Thanks Ezzedine for your review 👊🏻