A cron job is a way to run a program on a server at a specified interval. This is often a small script that does some repeatable task, like collect metrics, check the health of a service, take a snapshot of some dataset, etc.
The cron utility has a highly customizable format, called a crontab file, which allows us to run essentially any command at whatever interval we want. You can have your script run as often as once a minute, and as infrequently as once a year.
Most often, a cron job takes some input, does a little processing, and then generates some output. The input could be a web page, another application, a database, etc. The output might be anything from adding a row to a table in a database, to putting a file in storage, to sending a message.
Using cron jobs
Cron jobs have a lot of uses: essentially anything you want to happen in a repeatable way can be made into a cron job.
One thing I like to use cron jobs for is to keep an eye on websites that I visit infrequently. For example, I'd like to see articles posted to Hacker News about Serverless, but I don't have time to check it every day, and look at the title of every post to see if it's about Python.
Instead, I can write a cron job to do this for me. The steps will be roughly as follows:
- Once a day, get the top stories from the API;
- Iterate over every story;
- If any of their titles match "serverless":
- send an email with the links
Here's the script:
#!/usr/bin/python
import requests
from utils import send_email
api_url = 'https://hacker-news.firebaseio.com/v0/'
top_stories_url = api_url + 'topstories.json'
item_url = api_url + 'item/{}.json'
def scan_hacker_news()
# Make a request to the API
top_stories = requests.get(top_stories_url).json()
serverless_stories = []
# Iterate over every story
for story_id in top_stories:
story = requests.get(item_url.format(story_id)).json()
if 'serverless' in story['title'].lower():
serverless_stories.append(story)
# If there are any, send an email
if serverless_stories:
send_email(serverless_stories)
This makes a request to the API, iterates over every story, filters out stories that have "serverless" in the title, and sends me an email (we'll leave that function as an exercise to the reader).
If I wanted to set this up as a cron job on my Linux machine, I would save it to a file (send_me_pythons.py
), make it executable (chmod 755 send_me_pythons.py
) and put it in my local bin (/usr/local/bin/send_me_pythons.py
).
Then, I would edit my crontab (/etc/crontab
) and add the following line:
0 0 * * * /usr/local/bin/send_me_pythons.py
This runs the script once a day at midnight. I like to use https://crontab.guru/ to ensure that my crontab is right before I set it. Here's a diagram of each field:
# 0 0 * * * /usr/local/bin/send_me_pythons.py
# │ │ │ │ │ └── run this script
# │ │ │ │ └──── every year
# │ │ │ └────── every month
# │ │ └──────── every day
# │ └────────── at hour zero
# └──────────── at minute zero
Once the crontab is set, my machine will pick up on the new cron job and run it as often as I've specified.
Moving our script to the cloud
This is great, but it has one huge downside: now I have to keep my server up and running to ensure that the emails get sent, otherwise I'll be missing valuable Serverless content. And this is further complicated by the fact that either I should be using this server for other things besides running this little script once a day, or I need to keep an entire server up and running just for this little script! Not ideal.
Instead, let's take this function to the cloud. First, we'll turn it into a Google Cloud Function. We'll wrap our entire existing script in a Python function, and put it in a file called main.py
:
import requests
from utils import send_email
api_url = 'https://hacker-news.firebaseio.com/v0/'
top_stories_url = api_url + 'topstories.json'
item_url = api_url + 'item/{}.json'
def send_pythons(request):
# Make a request to the API
top_stories = requests.get(top_stories_url).json()
serverless_stories = []
# Iterate over every story
for story_id in top_stories:
story = requests.get(item_url.format(story_id)).json()
if 'serverless' in story['title'].lower():
serverless_stories.append(story)
# If there are any, send an email
if serverless_stories:
send_email(serverless_stories)
Note: Only the lines that need to happen every time the function is called actually need to be in the send_pythons
function. The import statements only need to be executed once when the function is loaded, and can be left outside the function.
Next, we'll define our dependencies. We're using requests
, so we need to put it in our requirements.txt
:
requests==2.20.0
Then, we can deploy this with the gcloud
command line tool:
$ gcloud beta functions deploy test --runtime python37 --trigger-http
This will give us an endpoint, something like:
https://us-central1-[PROJECT_ID].cloudfunctions.net/send_pythons
And making an HTTP GET
request to that endpoint will result in our function being run.
Scheduling our script in the cloud
Now we've got our script as a function in the cloud, all we need to do is schedule it. We can create a new Google Cloud Scheduler job with the gcloud
command line tool:
$ gcloud beta scheduler jobs create http send_pythons_job \
--schedule="0 0 * * *" \
--uri=https://us-central1-[PROJECT_ID].cloudfunctions.net/send_pythons
This specifies a name for our job, send_pythons_job
, which is unique per-project. It also specifies the crontab schedule we set above, and points the job to our HTTP function we created.
We can list our job:
$ gcloud beta scheduler jobs list --project=$PROJECT
ID LOCATION SCHEDULE (TZ) TARGET_TYPE STATE
send_pythons_job us-central1 0 0 * * * (Etc/UTC) HTTP ENABLED
And if we want to run our job out of schedule, we can do it from the command line:
$ gcloud beta scheduler jobs run send_pythons_job
Next steps
There's lots more you can do with Cloud Functions + Cloud Scheduler! Follow the links below to learn how to:
- Schedule more complex tasks with Cloud Scheduler
- Use Cloud Scheduler and Pub/Sub to trigger a Cloud Function
- Send email from App Engine with Mailjet or SendGrid
All code © Google w/ Apache 2 license