Archival cascades: a practical way to not break URLs
This week, I reduced our DigitalOcean hosting costs for Small Technology Foundation from ~$90/month to $5/month by moving our canonical source code repositories as well as a few other servers to the complimentary hosting provided to our not-for-profit by the Eclips.is initiative by Greenhost and Open Technology Fund.
As part of the move, I had to decide what to do with the three servers that were still running various parts of the Ind.ie web site:
The latest version of the Ind.ie web site, with a notice that we were now Small Technology Foundation (this was a Hugo site).
The site we had from 2013-2017 (this was server-side-generated site using a custom engine I’d written in Node.js).
The labs site, which linked to a number of projects and held a few technical blog posts (this was a server-side-rendered site with a custom Express server I’d written in Node.js and which was deployed using dokku).
We haven’t been known as Ind.ie for the past three years now but that doesn’t mean we can just switch those servers off and be done with it. I’m hugely proud of our journey over the past eight years. Most of all in the fact that, as hard as it has been, we are still going. And that we will keep going for the foreseeable future. The Ind.ie site is an archive of the learning, iterating, and growing (in knowledge, understanding, and tools; not in size) that we’ve done through five of those years. Including…
I knew I wanted to collate the three servers into one and use Site.js to host static versions of them.
The latest version of the Ind.ie web site was the easiest to deal with. Site.js has native support for Hugo so all I had to do was to create a
.hugo directory in my new site and copy it there.
So my archived Ind.ie site looked like this:
ind.ie/ ╰ .hugo ╰ (contents of the latest ind.ie site)
Next, I needed to serve the statically-generated contents of the site from 2013-2017. Now you’re probably thinking: “but Site.js is already serving the Hugo site and you need to serve that static content from the same namespace. How do you do that?”
Well, there are two ways I could have done it. Since Site.js simply serves any content in the root of your site as static content, I could have just copied the content there. But Site.js also has a feature specifically for this use case called archival cascades.
If you have static archives of previous versions of your site, you can have Site.js automatically serve them for you.
Just put them into folder named
If a path cannot be found in your current site, Site.js will search for it first in
.archive-2 and, if it cannot find it there either, in
Paths in your current site will override those in
.archive-2 and those in
.archive-2 will, similarly, override those in
This was you create a cascade of archives where you can serve static snapshots of older content.
Using the archival cascade, old links will never die but if you do replace them with newer content in newer versions, those will take precedence.
So, after this step, my archive site structure looked like this:
ind.ie/ ├ .hugo │ ╰ (contents of the latest ind.ie site) ╰ .archive-1 ╰ (contents of the ind.ie site from 2013-2018)
Taking a static snapshot using wget
Since the labs site was server-side rendered, I needed to get a static snapshot of it.2 The easiest way I could find of doing that was to use the handy
I simply ran the server on my local development machine (on port 3000) and ran the following command to save a static snapshot of it:
wget --recursive --domains localhost --no-parent --page-requisites http://localhost:3000
Once I had the static snapshot, I just added it to the archival cascade. So my final site structure looked like this:
ind.ie/ ├ .hugo │ ╰ (contents of the latest ind.ie site) ├ .archive-1 │ ╰ (contents of the ind.ie site from 2013-2018) ╰ .archive-2 ╰ /labs ╰ (contents of the ind.ie labs site)
Then, I set up a server running Site.js, pointed the ind.ie domain to it, and synced the site over.
Don’t break URLs
It’s all too simple to just turn a server off and forget about it but the web doesn’t forget. What you leave behind will be a bunch of dead links. It’s not always practical to keep everything running forever but if you want to try and not break links, archival cascades and 404 to 302 support in Site.js should help.
Like this? Fund us!
Small Technology Foundation is a tiny, independent not-for-profit.
Some of the links to articles on Labs might be broken as they refer to a forum that no longer exists. This was hosted by a commerical third-party and, sadly, we weren’t able to get a usable static export of our data at the time. ↩︎
As an alternative, if I had wanted to, I could have kept the server running somewhere and used the native 404 to 302 in Site.js to keep serving the old site. ↩︎