Anthos Service Mesh, Istio on Google Cloud ⛵️

Kevin Davin - May 18 '21 - - Dev Community

Anthos Service Mesh, a managed Istio ⛵️

Istio is one of the most advanced pieces of software in the Kubernetes ecosystem. It allows to redefine the way our services are communicating with each other, without being invasive. Istio works by taking control over the network of your Kubernetes cluster and allows applying configurations (through YAML). If you want to discover Istio, I invite you to read the excellent documentation provided in istio.io

istio-logo

The main problem with Istio, is the complexity to manage and configure it. Like kubernetes, this system is complex, and errors could lead to downtime in your cluster… 😓. Lots of software-company starts to provide a pre-configured version of Istio, and here we will talk about Anthos Service Mesh.

anthos-logo

Anthos Service Mesh is available on Anthos clusters running in Google Cloud, AWS or on-premise (different features are available depending on the cluster locality). Here, we will describe the Google Cloud version based on Anthos Service Mesh version 1.9.3.asm-2 (the last version may be different when you read this article).

NOTE: A fully managed version of Anthos Service Mesh exists but is actually in preview/beta. I prefer, for now, using the standard version of Anthos Service Mesh (aka Customer-managed control plane).

Installation

The installation is pretty straight-forward, Google provides a script install_asm to automate the installation on an already existing GKE cluster:

./install_asm \
  --project_id kevin-anthos-asm \
  --cluster_name anthos-asm-demo  \
  --cluster_location europe-north1-a  \
  --mode install \
  --output_dir ./asm-downloads \
  --enable_all
Enter fullscreen mode Exit fullscreen mode

NOTE: I am installing the latest version of ASM here, but you can choose a different one with the --revision_name parameter if required.

To be executed, the script has some requirements. I invite you to check that here. The best solution is to use the Google Cloud Shell to do the installation, it fulfills requirements by default.

install_asm: Setting up necessary files...
install_asm: Fetching/writing GCP credentials to kubeconfig file...
install_asm: [WARNING]: nc not found, skipping k8s connection verification
install_asm: [WARNING]: (Installation will continue normally.)
install_asm: Checking installation tool dependencies...
install_asm: Getting account information...
install_asm: Confirming cluster information for kevin-anthos-asm/europe-north1-a/anthos-asm-demo...
install_asm: Downloading ASM..
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 41.7M  100 41.7M    0     0  31.5M      0  0:00:01  0:00:01 --:--:-- 31.5M
install_asm: Downloading ASM kpt package...
fetching package "/asm" from "https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages" to "asm"
install_asm: Confirming node pool requirements for kevin-anthos-asm/europe-north1-a/anthos-asm-demo...
install_asm: Checking Istio installations...
install_asm: Enabling required APIs...
install_asm: Binding user:kevin.davin@stack-labs.com to required IAM roles...
install_asm: Checking for project kevin-anthos-asm...
install_asm: Reading labels for europe-north1-a/anthos-asm-demo...
install_asm: Adding labels to europe-north1-a/anthos-asm-demo...
install_asm: Enabling Workload Identity on europe-north1-a/anthos-asm-demo...
install_asm: (This could take awhile, up to 10 minutes)
install_asm: Initializing meshconfig API...
install_asm: Enabling Stackdriver on europe-north1-a/anthos-asm-demo...
install_asm: Querying for core/account...
install_asm: Binding kevin.davin@stack-labs.com to cluster admin role...
clusterrolebinding.rbac.authorization.k8s.io/kevin.davin-cluster-admin-binding created
install_asm: Creating istio-system namespace...
namespace/istio-system created
install_asm: Configuring kpt package...
asm/
set 22 field(s) of setter "gcloud.container.cluster" to value "anthos-asm-demo"
asm/
set 40 field(s) of setter "gcloud.core.project" to value "kevin-anthos-asm"
asm/
set 2 field(s) of setter "gcloud.project.projectNumber" to value "62405001080"
asm/
set 6 field(s) of setter "gcloud.project.environProjectNumber" to value "62405001080"
asm/
set 21 field(s) of setter "gcloud.compute.location" to value "europe-north1-a"
asm/
set 2 field(s) of setter "gcloud.compute.network" to value "kevin-anthos-asm-default"
asm/
set 6 field(s) of setter "anthos.servicemesh.rev" to value "asm-193-2"
asm/
set 2 field(s) of setter "anthos.servicemesh.tag" to value "1.9.3-asm.2"
install_asm: Installing validation webhook fix...
service/istiod created
install_asm: Installing ASM control plane...
install_asm: ...done!
install_asm: Installing ASM CanonicalService controller in asm-system namespace...
namespace/asm-system created
customresourcedefinition.apiextensions.k8s.io/canonicalservices.anthos.cloud.google.com created
role.rbac.authorization.k8s.io/canonical-service-leader-election-role created
clusterrole.rbac.authorization.k8s.io/canonical-service-manager-role created
clusterrole.rbac.authorization.k8s.io/canonical-service-metrics-reader created
serviceaccount/canonical-service-account created
rolebinding.rbac.authorization.k8s.io/canonical-service-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/canonical-service-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/canonical-service-proxy-rolebinding created
service/canonical-service-controller-manager-metrics-service created
deployment.apps/canonical-service-controller-manager created
install_asm: Waiting for deployment...
deployment.apps/canonical-service-controller-manager condition met
install_asm: ...done!
install_asm:
install_asm: *****************************
client version: 1.9.3-asm.2
control plane version: 1.9.3-asm.2
data plane version: 1.9.3-asm.2 (2 proxies)
install_asm: *****************************
install_asm: The ASM control plane installation is now complete.
install_asm: To enable automatic sidecar injection on a namespace, you can use the following command:
install_asm: kubectl label namespace <NAMESPACE> istio-injection- istio.io/rev=asm-193-2 --overwrite
install_asm: If you use 'istioctl install' afterwards to modify this installation, you will need
install_asm: to specify the option '--set revision=asm-193-2' to target this control plane
install_asm: instead of installing a new one.
install_asm: To finish the installation, enable Istio sidecar injection and restart your workloads.
install_asm: For more information, see:
install_asm: https://cloud.google.com/service-mesh/docs/proxy-injection
install_asm: The ASM package used for installation can be found at:
install_asm: /home/kevin_davin/anthos/asm/2021-05-11-apres-midi/asm-downloads/asm
install_asm: The version of istioctl that matches the installation can be found at:
install_asm: /home/kevin_davin/anthos/asm/2021-05-11-apres-midi/asm-downloads/istio-1.9.3-asm.2/bin/istioctl
install_asm: A symlink to the istioctl binary can be found at:
install_asm: /home/kevin_davin/anthos/asm/2021-05-11-apres-midi/asm-downloads/istioctl
install_asm: The combined configuration generated for installation can be found at:
install_asm: /home/kevin_davin/anthos/asm/2021-05-11-apres-midi/asm-downloads/asm-193-2-manifest-raw.yaml
install_asm: The full, expanded set of kubernetes resources can be found at:
install_asm: /home/kevin_davin/anthos/asm/2021-05-11-apres-midi/asm-downloads/asm-193-2-manifest-expanded.yaml
install_asm: *****************************
install_asm: Successfully installed ASM.
Enter fullscreen mode Exit fullscreen mode

This script will install a custom version of Istio, named Antos Service Mesh. At the end, your cluster will have two new namespaces, asm-system and istio-system:

$ kubectl get ns
NAME              STATUS   AGE
asm-system        Active   127m
default           Active   141m
istio-system      Active   128m
kube-node-lease   Active   141m
kube-public       Active   141m
kube-system       Active   141m
Enter fullscreen mode Exit fullscreen mode

You have successfully installed Anthos Service Mesh on your GKE Cluster… we have to use it now!

Namespace Activation

Istio is a cluster-wide tool, which can be activated at a namespace level or at a component level (but less common). We have to add a label to our namespace to trigger Istio functionalities on it:

$ kubectl create namespace workshop
$ kubectl label namespace workshop istio.io/rev=asm-193-2 --overwrite
Enter fullscreen mode Exit fullscreen mode

Here, asm-193-2 is the version provided by the install_asm command logs. With this information, ASM knows it will have to inject side-car container for every component of this namespace.

NOTE: If you want more details on the installation process, the official documentation is available here and provide a lot of information for various use cases.

Functionalities

Anthos Service Mesh is a branded version of Istio. Modifications provided by Google are pretty soft and here mainly to make the system compatible with the cloud console. In Customer-managed control plane, you have access to every functionality the vast majority of feature provided by Istio 1.9.

You can consult the complete list of features available here

Dashboards

The main advantage of ASM over the OpenSource version of Istio is its integration in the Google Cloud console.

For this example, I've deployed 3 applications in the workshop namespace. Those applications came from our Stack Labs Workshop on Istio (accessible here, and fully open source). With this, we can use theirs dashboards provided.

Global overview

We can have a global point-of-view on our micro-services deployed in our cluster. We have a tabular view, which can be filtered on namespace, providing a clear view of our services status and performance.

tabular-view

A global topology view (still in beta 🧪) allowing us to drill down on our services, components, deployments, pods… very useful if we want to understand the communication schema in our cluster.

topology-raw
topology-inside-service
topology-inside-deployment

If we want to have deeper understanding on each component, we have access to a service specific view, accessible by clicking on a service on the tabular view.

Service Dashboard

This dashboard will provide a specific point of view on the behaviour of your service. Here, we will focus on the middleware service. For this example, we configure this application to send back 500 errors 50% of the time.

middleware-details

This main view is here to summarize every following dashboards into one. It is an entry point for our service
analysis.

slo-overview

The Health view is here to present us the Service Level Objectives (SLO) based on Service Level Indicators (SLI) we defined on our service.

NOTE: If you want to learn more about SLOs and SLIs, this blog post summarizes it (a lot) the idea behind it. You can also read the documentation and books provide freely by Google SRE team here.

We can define SLOs and SLIs on our service with multiple information gathered by the cloud console and Anthos Service Mesh for us.

define-sli
define-slo

You can define an alerting strategy for each SLO. Multiple systems are available from Slack, Pager-Duty, Email for the most standard to web-hook, cloud function or any other programmating system. I choose email for this example and I receive this after few minutes, because the SLO defined was not reached anymore.

alert by email

We have access to a global metrics view of the service. CPU, RAM, requests by seconds… all the required information to be able to follow the health of the application.

metrics

We can analyse all the connectivity of our service. Here, we have a complete list of every services connecting to our service (inbound) or services reached by our service (outbound).

inbound
outbound

The infrastructure pan allows us to see every instance of our service over time. Each of them has its own metrics (CPU, RAM, error rate…) and so, we can analyse at a fine grain level the performance and behaviour of our system.

infrastructure

Security pane allows analysing communication security level between services. Istio and ASM provides a built-in way to communicate with mTLS between components. Here, you will see if exchanges are made "in clear" or with a secure protocol. I didn't configure anything and communications between frontend, middleware and database are secured by default.

security inbound

Finally, a useful view is available to consult YAML resources deployed in the cluster corresponding to this service. Deployment, VirtualService, DestinationRoute… All resources are available from the web-ui, simplifying analysis again.

resources

Timeline everywhere!

When you analyze production, especially when a problem occurs, you can't only use current data, you have to compare data with past data of the same system to elaborate a conclusion. Here, in every dashboard I introduce to you, you have the capacity to activate a timeline and narrow down your observations to a specific period of time.

timeline in metrics

Every table, graph, metrics will be adapted to the given time span to present you the information at a specific moment. This will be convenient when you will want to compare the behavior of a service before and after an upgrade, for example.

timeline in metrics

This feature is really awesome because you won't have to configure a dashboard for each needs. The system provided to you is made for operator, no need to customize a PromQL or Grafana query to analyse what's currently happening in production…

Conclusion

I focused this article mainly on Observability, because Anthos Service Mesh provides it out of the box. Even if Istio has wonderful features for traffic splitting, mirroring, authorization… the first reason you will want to use it for is Observability 🕵️‍♂️.

Google is now doing with Istio what it did with Kubernetes many years ago. It integrated it and simplify its usage to make it available for everyone with ease. The future version, with a Google Managed Control Plane should simplify it even more.

If you want to increase your observability with a managed and preconfigured system, I advise you to test Anthos Service Mesh!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .