Automatically installing Cloud Operations agents with Policy Agent

Yoshi Yamaguchi - Aug 12 '20 - - Dev Community

TL;DR

Agent Policy frees you from installing Google Cloud Logging and Google Cloud Monitoring agents to Google Compute Engine instances.

Preface

In Google Cloud Platform (GCP), Google Cloud Logging and Google Cloud Monitoring (hereinafter called Logging and Monitoring) are integrated with peer GCP services, and the users can observe system/audit logs and system metrics without any configurations. Also, application and container runtimes such as Google App Engine and Google Kubernetes Engine sends logs and metrics automatically to Logging and Monitoring.

However, when you run applications directly on top of Google Compute Engine, you had to install and setup the agents for Google Cloud Logging and Google Cloud Monitoring respectively all by yourselves.

How to setup Agent Policy

The required steps are explained in the official document and this post provides the additional demo to explain how it works.

Install gcloud alpha components

Agent Policy is still in alpha, and you need to install the alpha components to try it.

$ gcloud components install alpha

Set up proper privileges

GCP (and any other cloud platforms) always about IAMs. Setting up proper privileges and roles are important, or even mandatory to run specific features. Agent Policy is not the exception and you need to set up multiple IAM roles to multiple users and service accounts. However, the good news is the official supplemental shell script is provided.

The details of what it does are well described in the document. In short, it assigns roles/osconfig to appropriate users and service accounts.

Create Agent Policy

After setting up IAMs, now the main part. Create an Agent Policy with gcloud command.

For example, this example create an Agent Policy named ops-agents-debian: it installs Logging and Monitoring agents to new GCE instances generated from Debian 10 images. (See the list of public images)

$ gcloud alpha compute instances ops-agents policies create ops-agents-debian \
  --agent-rules="type=logging,version=current-major,package-state=installed,enable-autoupgrade=true;type=metrics,version=current-major,package-state=installed,enable-autoupgrade=true" \
  --os-types=short-name=debian,version=10

And all set! This example applies to all Debian 10 images. In the real use cases, you may want to set up more complex condition to apply Agent Policy, and using --group-labels should be better idea. Please find the details of the options of gcloud alpha compute instances ops-agents policies create.

Experiments

Case 1: Create a new instance with Debian 10

Let's see if the policy installs agents to Debian 10 instances. Run the following command.

$ gcloud compute instances create test0 \
  --image-project debian-cloud \
  --image-family=debian-10 \
  --zone=us-central1-a \
  --preemptible \
  --boot-disk-auto-delete
Created [https://www.googleapis.com/compute/v1/projects/agents-install-test/zones/us-central1-a/instances/test0].
NAME   ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
test0  us-central1-a  n1-standard-1  true         XX.XX.XX.XX   XX.XX.XX.XX  RUNNING

And confirm if Logging and Monitoring agents are installed and running.

$ gcloud compute ssh test0 --zone=us-central1-a
Writing 3 keys to /home/ymotongpoo/.ssh/google_compute_known_hosts
Enter passphrase for key '/home/ymotongpoo/.ssh/google_compute_engine':
Linux test0 4.19.0-10-cloud-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
ymotongpoo@test0:~$ sudo service google-fluentd status
● google-fluentd.service - LSB: data collector for Treasure Data
   Loaded: loaded (/etc/init.d/google-fluentd; generated)
   Active: active (running) since Tue 2020-08-11 08:50:36 UTC; 1min 17s ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 110 (limit: 4373)
   Memory: 66.9M
   CGroup: /system.slice/google-fluentd.service
           └─2128 /opt/google-fluentd/embedded/bin/ruby /usr/sbin/google-fluentd --log /var/log/google-fluentd/goo

Aug 11 08:50:36 test0 systemd[1]: Starting LSB: data collector for Treasure Data...
Aug 11 08:50:36 test0 google-fluentd[2106]: Starting google-fluentd 1.7.1: google-fluentd.
Aug 11 08:50:36 test0 systemd[1]: Started LSB: data collector for Treasure Data.
ymotongpoo@test0:~$ sudo service stackdriver-agent status
● stackdriver-agent.service - LSB: start and stop Stackdriver Agent
   Loaded: loaded (/etc/init.d/stackdriver-agent; generated)
   Active: active (running) since Tue 2020-08-11 08:50:42 UTC; 1min 32s ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 13 (limit: 4373)
   Memory: 6.2M
   CGroup: /system.slice/stackdriver-agent.service
           └─2470 /opt/stackdriver/collectd/sbin/stackdriver-collectd -C /etc/stackdriver/collectd.conf -P /var/ru

Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "write_gcm" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "match_regex" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "match_throttle_metadata_keys" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "stackdriver_agent" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "exec" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "aggregation" successfully loaded.
Aug 11 08:50:42 test0 stackdriver-agent[2449]: .
Aug 11 08:50:42 test0 systemd[1]: Started LSB: start and stop Stackdriver Agent.
Aug 11 08:50:42 test0 collectd[2470]: Initialization complete, entering read-loop.
Aug 11 08:50:42 test0 collectd[2470]: tcpconns plugin: Reading from netlink succeeded. Will use the netlink method

Yes, they are running!

Case 2: Create a new instance with CentOS 8

This time, I create an instance with CentOS 8, which doesn't apply to the ops-agent-debian policy I made.

$ gcloud compute instances create test1 \
  --image-project=centos-cloud \
  --image-family=centos-8 \
  --zone=us-central1-a \
  --preemptible \
  --boot-disk-auto-delete
Created [https://www.googleapis.com/compute/v1/projects/agents-install-test/zones/us-central1-a/instances/test0].
NAME   ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
test1  us-central1-a  n1-standard-1  true         XX.XX.XX.XX   XX.XX.XX.XX  RUNNING

$ gcloud compute ssh test1 --zone=us-central1-a                                                                  Writing 3 keys to /home/ymotongpoo/.ssh/google_compute_known_hosts
Enter passphrase for key '/home/ymotongpoo/.ssh/google_compute_engine':
[ymotongpoo@test0 ~]$ sudo service google-fluentd status
Redirecting to /bin/systemctl status google-fluentd.service
Unit google-fluentd.service could not be found.

[ymotongpoo@test0 ~]$ sudo service stackdriver-agent status
Redirecting to /bin/systemctl status stackdriver-agent.service
Unit stackdriver-agent.service could not be found.

This time, the agents are not installed.

Notes

As of Aug 12th, 2020, this feature is in alpha and only supported for the direct use of public images on GCE. (The supported short names are centos, debian, rhel, sles, sles-sap and ubuntu) This doesn't support containers running on GCE.

Also, this only works on gcloud command and is not available on Cloud Console and public APIs. In order to confirm existing policies and their details, you need to run the following commands respectively.

  • gcloud alpha compute instances ops-agents list
  • gcloud alpha compute instances ops-agents describe POLICY_ID

Because this is still in alpha, we appreciate your feedback. Please drop your experiences to ops-agent-policy-feedback@google.com or @ymotongpoo.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .