TL;DR
Agent Policy frees you from installing Google Cloud Logging and Google Cloud Monitoring agents to Google Compute Engine instances.
Preface
In Google Cloud Platform (GCP), Google Cloud Logging and Google Cloud Monitoring (hereinafter called Logging and Monitoring) are integrated with peer GCP services, and the users can observe system/audit logs and system metrics without any configurations. Also, application and container runtimes such as Google App Engine and Google Kubernetes Engine sends logs and metrics automatically to Logging and Monitoring.
However, when you run applications directly on top of Google Compute Engine, you had to install and setup the agents for Google Cloud Logging and Google Cloud Monitoring respectively all by yourselves.
How to setup Agent Policy
The required steps are explained in the official document and this post provides the additional demo to explain how it works.
Install gcloud alpha
components
Agent Policy is still in alpha, and you need to install the alpha components to try it.
$ gcloud components install alpha
Set up proper privileges
GCP (and any other cloud platforms) always about IAMs. Setting up proper privileges and roles are important, or even mandatory to run specific features. Agent Policy is not the exception and you need to set up multiple IAM roles to multiple users and service accounts. However, the good news is the official supplemental shell script is provided.
The details of what it does are well described in the document. In short, it assigns roles/osconfig
to appropriate users and service accounts.
Create Agent Policy
After setting up IAMs, now the main part. Create an Agent Policy with gcloud
command.
For example, this example create an Agent Policy named ops-agents-debian
: it installs Logging and Monitoring agents to new GCE instances generated from Debian 10 images. (See the list of public images)
$ gcloud alpha compute instances ops-agents policies create ops-agents-debian \
--agent-rules="type=logging,version=current-major,package-state=installed,enable-autoupgrade=true;type=metrics,version=current-major,package-state=installed,enable-autoupgrade=true" \
--os-types=short-name=debian,version=10
And all set! This example applies to all Debian 10 images. In the real use cases, you may want to set up more complex condition to apply Agent Policy, and using --group-labels
should be better idea. Please find the details of the options of gcloud alpha compute instances ops-agents policies create
.
Experiments
Case 1: Create a new instance with Debian 10
Let's see if the policy installs agents to Debian 10 instances. Run the following command.
$ gcloud compute instances create test0 \
--image-project debian-cloud \
--image-family=debian-10 \
--zone=us-central1-a \
--preemptible \
--boot-disk-auto-delete
Created [https://www.googleapis.com/compute/v1/projects/agents-install-test/zones/us-central1-a/instances/test0].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
test0 us-central1-a n1-standard-1 true XX.XX.XX.XX XX.XX.XX.XX RUNNING
And confirm if Logging and Monitoring agents are installed and running.
$ gcloud compute ssh test0 --zone=us-central1-a
Writing 3 keys to /home/ymotongpoo/.ssh/google_compute_known_hosts
Enter passphrase for key '/home/ymotongpoo/.ssh/google_compute_engine':
Linux test0 4.19.0-10-cloud-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
ymotongpoo@test0:~$ sudo service google-fluentd status
● google-fluentd.service - LSB: data collector for Treasure Data
Loaded: loaded (/etc/init.d/google-fluentd; generated)
Active: active (running) since Tue 2020-08-11 08:50:36 UTC; 1min 17s ago
Docs: man:systemd-sysv-generator(8)
Tasks: 110 (limit: 4373)
Memory: 66.9M
CGroup: /system.slice/google-fluentd.service
└─2128 /opt/google-fluentd/embedded/bin/ruby /usr/sbin/google-fluentd --log /var/log/google-fluentd/goo
Aug 11 08:50:36 test0 systemd[1]: Starting LSB: data collector for Treasure Data...
Aug 11 08:50:36 test0 google-fluentd[2106]: Starting google-fluentd 1.7.1: google-fluentd.
Aug 11 08:50:36 test0 systemd[1]: Started LSB: data collector for Treasure Data.
ymotongpoo@test0:~$ sudo service stackdriver-agent status
● stackdriver-agent.service - LSB: start and stop Stackdriver Agent
Loaded: loaded (/etc/init.d/stackdriver-agent; generated)
Active: active (running) since Tue 2020-08-11 08:50:42 UTC; 1min 32s ago
Docs: man:systemd-sysv-generator(8)
Tasks: 13 (limit: 4373)
Memory: 6.2M
CGroup: /system.slice/stackdriver-agent.service
└─2470 /opt/stackdriver/collectd/sbin/stackdriver-collectd -C /etc/stackdriver/collectd.conf -P /var/ru
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "write_gcm" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "match_regex" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "match_throttle_metadata_keys" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "stackdriver_agent" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "exec" successfully loaded.
Aug 11 08:50:42 test0 collectd[2469]: plugin_load: plugin "aggregation" successfully loaded.
Aug 11 08:50:42 test0 stackdriver-agent[2449]: .
Aug 11 08:50:42 test0 systemd[1]: Started LSB: start and stop Stackdriver Agent.
Aug 11 08:50:42 test0 collectd[2470]: Initialization complete, entering read-loop.
Aug 11 08:50:42 test0 collectd[2470]: tcpconns plugin: Reading from netlink succeeded. Will use the netlink method
Yes, they are running!
Case 2: Create a new instance with CentOS 8
This time, I create an instance with CentOS 8, which doesn't apply to the ops-agent-debian
policy I made.
$ gcloud compute instances create test1 \
--image-project=centos-cloud \
--image-family=centos-8 \
--zone=us-central1-a \
--preemptible \
--boot-disk-auto-delete
Created [https://www.googleapis.com/compute/v1/projects/agents-install-test/zones/us-central1-a/instances/test0].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
test1 us-central1-a n1-standard-1 true XX.XX.XX.XX XX.XX.XX.XX RUNNING
$ gcloud compute ssh test1 --zone=us-central1-a Writing 3 keys to /home/ymotongpoo/.ssh/google_compute_known_hosts
Enter passphrase for key '/home/ymotongpoo/.ssh/google_compute_engine':
[ymotongpoo@test0 ~]$ sudo service google-fluentd status
Redirecting to /bin/systemctl status google-fluentd.service
Unit google-fluentd.service could not be found.
[ymotongpoo@test0 ~]$ sudo service stackdriver-agent status
Redirecting to /bin/systemctl status stackdriver-agent.service
Unit stackdriver-agent.service could not be found.
This time, the agents are not installed.
Notes
As of Aug 12th, 2020, this feature is in alpha and only supported for the direct use of public images on GCE. (The supported short names are centos
, debian
, rhel
, sles
, sles-sap
and ubuntu
) This doesn't support containers running on GCE.
Also, this only works on gcloud
command and is not available on Cloud Console and public APIs. In order to confirm existing policies and their details, you need to run the following commands respectively.
gcloud alpha compute instances ops-agents list
gcloud alpha compute instances ops-agents describe POLICY_ID
Because this is still in alpha, we appreciate your feedback. Please drop your experiences to ops-agent-policy-feedback@google.com or @ymotongpoo.