SRE book notes: The Evolution of Automation at Google

Hercules Lemke Merscher - Jan 19 '23 - - Dev Community

These are the notes from Chapter 7: The Evolution of Automation at Google from the book Site Reliability Engineering, How Google Runs Production Systems.

This is a post of a series. The previous post can be seen here:


doing automation thoughtlessly can create as many problems as it solves


It isn’t appropriate to automate every component of every system, and not everyone has the ability or inclination to develop automation at a particular time. Some essential systems started out as quick prototypes, not designed to last or to interface with automation.


Automate Yourself Out of a Job: Automate ALL the Things!

Automate ALL the Things!


We graduated from optimizing our infrastructure for a lack of failover to embracing the idea that failure is inevitable, and therefore optimizing to recover quickly through automation.


A team not running automation has no incentive to build systems that are easy to automate.


The most functional tools are usually written by those who use them.


shipping and iterating rapidly might allow you to implement functionality faster, yet rarely makes for a resilient system.


A post worth reading, from the Engine Yard blog:

Pets vs. Cattle – EngineYard

The difference between the pre-virtualisation model and the post-virtualisation model can be thought of as the difference between pets and cattle.

favicon engineyard.com

If you liked this post, consider subscribing to my newsletter Bit Maybe Wise.

You can also follow me on Twitter and Mastodon.


Photo by Lenny Kuhne on Unsplash

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .