GitHub Actions for semi-static web sites

Dave Cross - Nov 22 '20 - - Dev Community

I've been using GitHub pages to host static web sites for a few years. I wrote a brief introduction on how to do that a couple of years ago.

However, very few pages are completely static. So I still had to host many of my web sites in places where I could execute the code that was used to run the site. But in the last couple of weeks, I've realised that there is another type of site that can be successfully hosted on GitHub pages using GitHub Actions. I call these "semi-static" sites. It would probably help if I explained what I mean by this term. I'll start by giving an example.

I've written before about my CPAN Dashboard site. It's a site that allows CPAN authors to monitor the various continuous integration services that they use to develop their code. (For those of you who haven't heard of it - CPAN is the "Comprehensive Perl Archive Network - the site where Perl programmers can find thousands of libraries that extend the Perl language).

This site is mostly static. There's a page for each author who uses the site. Those pages are mostly taken up by a big table. Each row in the table contains data about one of the author's modules. There are links to the code repo and the module's page on CPAN, the version number and release date of the most recent version of the module and a series of badges indicating the status of the module on various CI services. The list of an author's modules is generated by making a call to the MetaCPAN API.

The site currently has two other pages: the home page (which basically lists the authors using the site) and a page telling authors how they can add themselves to the site (which is by sending a pull request to the repo that hosts the site).

As I say, the site is mostly static. There are only a few ways that the site can change.

  • I change the information in one of the static pages
  • I change the formatting of site
  • A new author sends a pull request to add themself to the site
  • An author who uses the site releases a new module (or gives up ownership of an existing one)
  • An author who uses the site releases a new version of module (meaning that the last-released date needs to change)

There is a single program in the repo that can be used to rebuild the site in all of these circumstances.

If I change the site in some way (the first two items in the list), obviously I know that this change has been made and can run the regeneration program and commit the regenerated version.

If an author sends a pull request to add themself to the site, I can merge the site, then pull down the latest version and regenerate it to add the page for the new user. But it would be nicer if I just had to merge the pull request and the rebuild was handled automatically.

But I can't know when an author adds a new module or releases a new version of a module - well not without monitoring CPAN rather more closely than I have time for. It would be better if the site is automatically rebuilt periodically (say once an hour) and checked in if something has changed. And that's what I can now do - thanks to GitHub Actions.

GitHub actions are configured by adding a YAML file to the .github/workflows directory in a code repo. The YAML configures what the action does and how it is triggered. Here's the current version of the workflow file for my dashboard repo:

name: Generate web page

on:
  push:
    branches: '*'
  schedule:
    - cron: '7 */6 * * *'
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest

    container:
      image: perldocker/perl-tester:5.30   # https://hub.docker.com/r/perldocker/perl-tester

    steps:
    - name: Checkout
      uses: actions/checkout@v2

    - name: Install modules
      run: |
          cpanm --installdeps --notest .
    - name: Create pages
      run: |
        mkdir -p docs
        perl dashboard
    - name: Commit new page
      if: github.repository == 'davorg/dashboard'
      run: |
        GIT_STATUS=$(git status --porcelain)
        echo $GIT_STATUS
        git config --global user.name 'Dave Cross'
        git config --global user.email 'dave@dave.org.uk'
        git add docs/
        if [ "$GIT_STATUS" != "" ]; then git commit -m "Automated Web page generation"; fi
        if [ "$GIT_STATUS" != "" ]; then git push; fi
Enter fullscreen mode Exit fullscreen mode

I think it's pretty easy to understand, but let's go through it a section at a time.

We start with the on: key. That defines when the workflow will be triggered. In this case, we have three triggers.

  • If there's a push to any branch (most usefully, this is triggered when I merge a pull request)
  • On a cron schedule. Here I run it hourly - at seven minutes past the hour
  • And workflow_dispatch: adds a button to the action's page in the repo. Pressing this button will run the action at any time.

The next section (jobs:) defines what we do. We're using the build-in ability to run in a Docker image (here, we use one of the official Perl images - which adds a number of useful Perl tools to a standard Ubuntu image). We checkout the repo, install the dependencies using cpanm and then run the program (called dashboard) that regenerates the site. We then use a low-level (porcelain) git command to determine whether any of the files actually changed; and, if they did, we commit the changes and they will appear on the web site.

This solves all of my problems. If I change the site in some way and forget to regenerate it before committing, then the site will be regenerated automatically. When I merge a pull request from a new author, the site will be regenerated automatically. And the site will be regenerated automatically each hour, which will take care of the case when any of the authors' lists of modules have changed in any way.

So that's what I mean by a "semi-static" site. It's one where the pages stay the same most of the time, but there are a few, well-defined, events can change the contents of the site. As long as those events can be mapped onto the various triggers for GitHub actions, then the regeneration of the site can be handled automatically.

Here's another example. I run another site called Planet Perl. This is an old-school web feed aggregator. It knows about a number of web feeds about Perl programming and it combines their content into a single page (and another web feed). Once again, there is a small list of events that can change the site:

  • I change the look and feel of the site
  • I add a new web feed (or someone submits a pull request that adds a web feed)
  • Every hour we poll all of the feeds and rebuild the site

I won't go through the workflow definition again, but here it is if you'd like to take a look (it's actually very similar to the previous one).

Please let me know in the comments if you can think of any other kinds of site that this approach would work for. Or if you can suggest any improvements to my system.

The original version of the CPAN Dashboard workflow was sent to me by Gabor Szabo. Many thanks to him for showing me how to do it.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .