Building a Static Site Generator in 3 steps

Andreas - May 2 - - Dev Community

Maybe you know this situation: You are not so happy anymore with your personal website, the look and feel no longer represents you well enough and the time has come for a relaunch. I've been at this point recently and asked myself:

What do I actually really need?

I've been through a bunch for CMSes, always using my own website as playground for new technologies. That isn't a bad or uncommon approach for a web dev, but I wanted to reduce complexity and maintenance effort this time. And then I had this most brilliant, completely new, unheard before, genius idea:

What if I'd just write vanilla HTML?! (ba dum, tss!)

My head was spinning. That'd mean tiny loading times, no server side rendering or precompiling and best of all: No dependencies hence no build step! I was happy like a kid the day before Christmas. I felt like I had just made the invention of the century.

But then came the disillusionment: What about computed data? Or data retrieved per API? I usually showed some GitHub and DEV post stats on my website. Doing it with JavaScript would result in additional request on page load again. And I most certainly wouldn't want to change my age manually every year (okay, that one wouldn't be too bad if I forgot 😅). My brain was frantically searching for a way out of this to maintain its latest achievements. It finally had to realize, that at least one single layer of data retrieval and insertion was needed. But would that be possible without any dependencies?


It turned out, that I basically wanted a tiny Static Site Generator and I could have chosen one of the already existing great solutions out there. But sometimes, you just want to stay dependency free, keep things simple and and at the same time up-to-date and running long-term without having to worry about them.

I broke it down to 3 steps:

  1. Create the markup: Do the actual webdesign work. Write HTML and CSS with all the content I want.
  2. Retrieve and process the data: Write a script that calls all APIs, does all data manipulation I need and inserts it into my content.
  3. Automate it: Call the script from 2. automatically to keep the content up-to-date

Let's visualize that:

Schema of a static site generator

Still interested? Then let's get our hands dirty.

1. Creating the markup



# create the template file
touch template.html


Enter fullscreen mode Exit fullscreen mode

Ahh, feels good to write good ol' plain HTML again. I'm sure every IDE has a shortcut for a basic HTML structure. For VS Code, just create an empty HTML file, type ! and hit TAB.

This one might be a bit more work than clicking "Install" on the next popular WordPress theme. But personally I enjoy writing HTML and CSS and building small websites from scratch like this, giving it a personal touch. I even decided not to use JavaScript but that was more of a personal challenge and totally not necessary.

If you don't want to start from scratch, there are a lot of good vanilla HTML templates out there, e.g. HTML5up. For now, let's use this example:



<!-- template.html -->
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>My Website</title>
</head>
<body>
  <header>
    I'm a 36 y/o software engineer.
  </header>
  <main>
    <p>I love Open Source.</p>
    <p>I created 123 PRs on GitHub.</p>
  </main>
</body>
</html>


Enter fullscreen mode Exit fullscreen mode

2. Retrieving and Processing the data



# create the script file
touch build.sh


Enter fullscreen mode Exit fullscreen mode

Now it gets interesting. To retrieve and manipulate data I decided to simply use the Bash. It is available on nearly every Linux distro and has a powerful language which should be sufficient to retrieve data and put it into our HTML file. So for our example, a possible build script could look like this:



# build.sh

GH_API_URL='https://api.github.com/graphql'
GITHUB_TOKEN='abc_AB12CD34...'

# call the GitHub API via curl
QUERY='{ viewer { pullRequests { totalCount } } }'
RESULT=$(curl -s -i -H 'Content-Type: application/json' -H "Authorization: bearer $GITHUB_TOKEN" -X POST -d "{\"query\": \"query $QUERY\"}" $GH_API_URL | tail -n 1)

# get the data we want
let AGE=(`date +%s`-`date +%s -d 1987-06-05`)/31536000
PR_COUNT=$(echo $RESULT | sed -r 's|.*"totalCount":([0-9]*).*|\1|g')


Enter fullscreen mode Exit fullscreen mode

What's happening here? We're calling the GitHub GraphQL API by using curl and storing the json response in $RESULT. Note that you'll need an access token, which you can generate in your GitHub settings. Since we get JSON with only one totalCount key, we can extract the number that follows that key with sed and a little regex. Also you can use let to assign a calculation directly to a variable, here an age calculated from a given date.

The last thing that's missing now, is to insert the data into our template. I decided to just use common template variable notation {{...}} (of course you can choose whatever you like) and modified the template.html like this:



<!-- template.html -->
...
  <header>
    I'm a {{age}} y/o software engineer.
  </header>
  <main>
    <p>I love Open Source.</p>
    <p>I created {{pr_count}} PRs on GitHub.</p>
  </main>
...


Enter fullscreen mode Exit fullscreen mode

To replace them, we let the script copy our template and just use sed with some replacement regex again:



# build.sh
...
cp template.html index.html
sed -i -e "s|{{age}}|$AGE|g;s|{{pr_count}}|$PR_COUNT|g" index.html


Enter fullscreen mode Exit fullscreen mode

Et voilĂ ! We now have a ready-to-be-served index.html containg a computed age and API retrieved pull request count.

3. Configuration and automation

Let's now improve our bash script and make it actually configurable. You might have noticed that e.g. the GitHub token and the birth date both were just hard-coded into the script. A much better approach especially for sensitive data would be, to hold all config in a separate file. I decided to use a simple .env file, but you can use whatever suits your case:



# create a config file
touch .env


Enter fullscreen mode Exit fullscreen mode


# .env
BIRTH_DATE=1987-06-05
GITHUB_TOKEN=ghp_ABCDEFGHIK123456789


Enter fullscreen mode Exit fullscreen mode

To load this configuration into the bash script, you can simply source it. That way, all config variables automatically become bash variables:



# build.sh
source .env
...
let AGE=(`date +%s`-`date +%s -d $BIRTH_DATE`)/31536000
...


Enter fullscreen mode Exit fullscreen mode

Now that we have an HTML template and a configurable Bash script that generates a servable index.html, we can finally execute that script—how and as often* as we like. You can run it manually, but you might as well automate the execution e.g. with a cron job or using GitHub actions. This flexibility is a huge advantage if you e.g. have to move your website to another server.

* Well, not limitless since there are limitations to the number of API calls per time. Just keep the repetition reasonably, e.g. I decided to call it once every 10 minutes.

Wrapping it up

So what we did here was creating a very basic and simple static site generator. Let's have a last look at the pros and cons of this approach in diff style:



+ Lightning fast, no blockers/requests on or after page load
+ Easy to maintain, no npm/composer update etc.
+ Flexible and (almost) tech and location independent
- Might be hard for some people to create/find a HTML template
- Not exactly beginner-friendly, requires knowledge of command line and handling raw data
- Might become less maintainable with a lot of pages


Enter fullscreen mode Exit fullscreen mode

After diving into this, I can say it's still the best choice for my use case (single page website for a dev loving command line). If you want to have a look at my shiny new generated website, feel free:

https://devmount.com

And of course it's open source! Would be an honor if you use it as a template for your next little project:

https://github.com/devmount/devmount.com

You have something to add, need some explanation or found a critical aspect I didn't think of? Please let me know in the comments.

For convenience, here are the complete example files, if you'd like to fiddle around a bit with it:



<!-- template.html -->
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>My Website</title>
</head>
<body>
  <header>
    I'm a {{age}} y/o software engineer.
  </header>
  <main>
    <p>I love Open Source.</p>
    <p>I created {{pr_count}} PRs on GitHub.</p>
  </main>
</body>
</html>


Enter fullscreen mode Exit fullscreen mode


# .env
BIRTH_DATE=1987-06-05
GITHUB_TOKEN=ghp_ABCDEFGHIK123456789


Enter fullscreen mode Exit fullscreen mode


# build.sh
source .env

GH_API_URL='https://api.github.com/graphql'

# call the GitHub API via curl
QUERY='{ viewer { pullRequests { totalCount } } }'
RESULT=$(curl -s -i -H 'Content-Type: application/json' -H "Authorization: bearer $GITHUB_TOKEN" -X POST -d "{\"query\": \"query $QUERY\"}" $GH_API_URL | tail -n 1)

# get the data we want
let AGE=(`date +%s`-`date +%s -d $BIRTH_DATE`)/31536000
PR_COUNT=$(echo $RESULT | sed -r 's|.*"totalCount":([0-9]*).*|\1|g')

# generate website and replace template variables
cp template.html index.html
sed -i -e "s|{{age}}|$AGE|g;s|{{pr_count}}|$PR_COUNT|g" index.html


Enter fullscreen mode Exit fullscreen mode

Published: 2nd May 2024

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .