Moving your cron job to the cloud with Google Cloud Functions

Dustin Ingram - May 3 '19 - - Dev Community

A cron job is a way to run a program on a server at a specified interval. This is often a small script that does some repeatable task, like collect metrics, check the health of a service, take a snapshot of some dataset, etc.

The cron utility has a highly customizable format, called a crontab file, which allows us to run essentially any command at whatever interval we want. You can have your script run as often as once a minute, and as infrequently as once a year.

Most often, a cron job takes some input, does a little processing, and then generates some output. The input could be a web page, another application, a database, etc. The output might be anything from adding a row to a table in a database, to putting a file in storage, to sending a message.

Using cron jobs

Cron jobs have a lot of uses: essentially anything you want to happen in a repeatable way can be made into a cron job.

One thing I like to use cron jobs for is to keep an eye on websites that I visit infrequently. For example, I'd like to see articles posted to Hacker News about Serverless, but I don't have time to check it every day, and look at the title of every post to see if it's about Python.

Instead, I can write a cron job to do this for me. The steps will be roughly as follows:

  • Once a day, get the top stories from the API;
  • Iterate over every story;
  • If any of their titles match "serverless":
    • send an email with the links

Here's the script:

#!/usr/bin/python

import requests
from utils import send_email

api_url = 'https://hacker-news.firebaseio.com/v0/'
top_stories_url = api_url + 'topstories.json'
item_url = api_url + 'item/{}.json'

def scan_hacker_news()
    # Make a request to the API
    top_stories = requests.get(top_stories_url).json()
    serverless_stories = []

    # Iterate over every story
    for story_id in top_stories:
        story = requests.get(item_url.format(story_id)).json()
        if 'serverless' in story['title'].lower():
            serverless_stories.append(story)

    # If there are any, send an email
    if serverless_stories:
        send_email(serverless_stories)
Enter fullscreen mode Exit fullscreen mode

This makes a request to the API, iterates over every story, filters out stories that have "serverless" in the title, and sends me an email (we'll leave that function as an exercise to the reader).

If I wanted to set this up as a cron job on my Linux machine, I would save it to a file (send_me_pythons.py), make it executable (chmod 755 send_me_pythons.py) and put it in my local bin (/usr/local/bin/send_me_pythons.py).

Then, I would edit my crontab (/etc/crontab) and add the following line:

0 0 * * * /usr/local/bin/send_me_pythons.py
Enter fullscreen mode Exit fullscreen mode

This runs the script once a day at midnight. I like to use https://crontab.guru/ to ensure that my crontab is right before I set it. Here's a diagram of each field:

# 0 0 * * * /usr/local/bin/send_me_pythons.py
# │ │ │ │ │ └── run this script
# │ │ │ │ └──── every year
# │ │ │ └────── every month
# │ │ └──────── every day
# │ └────────── at hour zero
# └──────────── at minute zero
Enter fullscreen mode Exit fullscreen mode

Once the crontab is set, my machine will pick up on the new cron job and run it as often as I've specified.

Moving our script to the cloud

This is great, but it has one huge downside: now I have to keep my server up and running to ensure that the emails get sent, otherwise I'll be missing valuable Serverless content. And this is further complicated by the fact that either I should be using this server for other things besides running this little script once a day, or I need to keep an entire server up and running just for this little script! Not ideal.

Instead, let's take this function to the cloud. First, we'll turn it into a Google Cloud Function. We'll wrap our entire existing script in a Python function, and put it in a file called main.py:

import requests
from utils import send_email

api_url = 'https://hacker-news.firebaseio.com/v0/'
top_stories_url = api_url + 'topstories.json'
item_url = api_url + 'item/{}.json'

def send_pythons(request):
    # Make a request to the API
    top_stories = requests.get(top_stories_url).json()
    serverless_stories = []

    # Iterate over every story
    for story_id in top_stories:
        story = requests.get(item_url.format(story_id)).json()
        if 'serverless' in story['title'].lower():
            serverless_stories.append(story)

    # If there are any, send an email
    if serverless_stories:
        send_email(serverless_stories)
Enter fullscreen mode Exit fullscreen mode

Note: Only the lines that need to happen every time the function is called actually need to be in the send_pythons function. The import statements only need to be executed once when the function is loaded, and can be left outside the function.

Next, we'll define our dependencies. We're using requests, so we need to put it in our requirements.txt:

requests==2.20.0
Enter fullscreen mode Exit fullscreen mode

Then, we can deploy this with the gcloud command line tool:

$ gcloud beta functions deploy test --runtime python37 --trigger-http
Enter fullscreen mode Exit fullscreen mode

This will give us an endpoint, something like:

https://us-central1-[PROJECT_ID].cloudfunctions.net/send_pythons
Enter fullscreen mode Exit fullscreen mode

And making an HTTP GET request to that endpoint will result in our function being run.

Scheduling our script in the cloud

Now we've got our script as a function in the cloud, all we need to do is schedule it. We can create a new Google Cloud Scheduler job with the gcloud command line tool:

$ gcloud beta scheduler jobs create http send_pythons_job \
    --schedule="0 0 * * *" \
    --uri=https://us-central1-[PROJECT_ID].cloudfunctions.net/send_pythons
Enter fullscreen mode Exit fullscreen mode

This specifies a name for our job, send_pythons_job, which is unique per-project. It also specifies the crontab schedule we set above, and points the job to our HTTP function we created.

We can list our job:

$ gcloud beta scheduler jobs list --project=$PROJECT
ID                LOCATION     SCHEDULE (TZ)        TARGET_TYPE  STATE
send_pythons_job  us-central1  0 0 * * * (Etc/UTC)  HTTP         ENABLED
Enter fullscreen mode Exit fullscreen mode

And if we want to run our job out of schedule, we can do it from the command line:

$ gcloud beta scheduler jobs run send_pythons_job
Enter fullscreen mode Exit fullscreen mode

Next steps

There's lots more you can do with Cloud Functions + Cloud Scheduler! Follow the links below to learn how to:

All code © Google w/ Apache 2 license

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .