Kill dead links in Jekyll: stable permalinks, redirect_from, and a CI link checker

A retro terminal warding off 404 errors with binary sigils

You rename a post. The title was wrong, the slug was uglier than the title, and now it’s fixed. You feel good for about a day, until someone clicks a link from six months ago and lands on a 404. The link wasn’t broken when they saved it. You broke it.

Dead links are the tax you pay for editing your own site. The fix isn’t “never rename things” — it’s three pieces of config and one CI job, so that when you do rename things, the old URL still works and a robot yells at you before a human finds the hole.

Here’s the whole kit. None of it needs a server.

Step 1: Make permalinks that don’t move

Half of all 404s come from URLs that change shape for no reason — .html one day, a trailing slash the next, a date prefix you didn’t ask for. Pin the shape once in _config.yml:

permalink: pretty          # /my-post/  not  /my-post.html
url: https://example.com   # your real production domain
baseurl: ""                # "" for a root domain; "/repo" for project pages
plugins:
  - jekyll-sitemap
  - jekyll-redirect-from

permalink: pretty gives every page a clean directory-style URL with a trailing slash, so links stop flipping between .html and slash forms. jekyll-sitemap writes a sitemap.xml so crawlers can rediscover pages that moved. jekyll-redirect-from is the one that does the actual saving — Step 3.

You’ll know it worked when bundle exec jekyll build produces _site/my-post/index.html (a directory with an index) instead of _site/my-post.html (a bare file). Pretty permalinks make directories.

Step 2: Keep the old URL alive when you rename

This is the move. When you change a slug, you don’t abandon the old path — you make the new page answer to both names. Add redirect_from to the renamed file’s front matter:

---
title: "The Better Title"
permalink: /the-better-title/
redirect_from:
  - /the-old-title/
  - /2024/03/10/the-old-title/   # an old dated path counts too
---

jekyll-redirect-from generates a tiny stub page at each old path that bounces the visitor to the new one. No .htaccess, no server rules — it ships as plain HTML, which is exactly what GitHub Pages serves.

You’ll know it worked when _site/the-old-title/index.html exists after a build and contains a <meta http-equiv="refresh"> pointing at /the-better-title/. The old link resolves; the reader never sees the seam.

One rule that keeps this sane: one canonical URL per page, every other path redirects to it. Don’t give a page two live permalinks and hope. Pick the real one, redirect the rest.

Step 3: A 404 page that’s a map, not a wall

Even with redirects, some links die for real — external sites vanish, you delete a page on purpose. Make the dead end useful. Put this in 404.html at your site root:

---
permalink: /404.html
---
<main style="max-width:720px;margin:3rem auto;padding:0 1rem">
  <h1>404 — that page moved or never existed</h1>
  <p>Two doors out:</p>
  <ul>
    <li><a href="{{ '/' | relative_url }}">Home</a></li>
    <li><a href="{{ '/sitemap.xml' | relative_url }}">Sitemap (every live page)</a></li>
  </ul>
  <h2>Recent posts</h2>
  <ul>
    {% for post in site.posts limit:5 %}
      <li><a href="{{ post.url | relative_url }}">{{ post.title }}</a></li>
    {% endfor %}
  </ul>
</main>

permalink: /404.html is what GitHub Pages looks for to serve a custom 404. The relative_url filter matters — it prepends your baseurl, so the links work whether you’re on a root domain or a project page. (Hard-code a leading-slash path here and you’ll reproduce the exact bug in the next section.)

You’ll know it worked when visiting a made-up path on the deployed site shows your page and its links go somewhere real.

Catch them before CI does: a grep dead-link pre-check

Before you push and wait for CI, you can find broken internal links with nothing but grep and the built _site/ directory. Pull every internal href and check whether the target file exists on disk. Here’s the idea, run against a tiny throwaway site so you can see the shape of the output:

# Make a throwaway "site" with two real pages and one broken link.
site="$(mktemp -d)"
cd "$site"

cat > index.html <<'HTML'
<a href="/about.html">About</a>
<a href="/team.html">Team</a>
<a href="https://example.com/">External (skipped)</a>
HTML

cat > about.html <<'HTML'
<a href="/index.html">Home</a>
HTML

# Pull internal hrefs (start with a single /), strip to a path,
# and report whether each target exists on disk.
grep -rhoE 'href="/[^"/][^"]*"' . \
  | sed -E 's@^href="/([^"]*)"@\1@' \
  | sort -u \
  | while read -r path; do
      [ -f "$path" ] && echo "OK    /$path" || echo "DEAD  /$path"
    done

rm -rf "$site"

We ran that; here’s the real output:

OK    /about.html
OK    /index.html
DEAD  /team.html

team.html is the link with no file behind it — the 404 you would have shipped. The external https:// link is skipped on purpose: grep can’t tell you if a remote host is up, only whether a local file exists. Point this at your real _site/ after a build (with permalink: pretty, internal targets look like /my-post/, so check for $path/index.html too) and it’ll list your broken internal links in about a second.

This is a smoke test, not the full check. It doesn’t follow external links, parse redirects, or understand baseurl. That’s the CI job’s job.

Step 4: The CI job that follows every link

Once per pull request, have a real link checker crawl everything — internal and external. lychee is fast and ships as a GitHub Action:

name: link-check
on:
  pull_request:
  schedule:
    - cron: '0 3 * * 1'   # weekly sweep catches bit-rot in old posts
jobs:
  lychee:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: lycheeverse/lychee-action@v2
        with:
          args: >-
            --no-progress
            --accept 200,204,206,301,302,308
            --exclude-mail
            --timeout 20
            './**/*.md' './**/*.html'
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The --accept list is the part people skip and then wonder why every redirect is “broken”: 301/302/308 are redirects, not failures, so a healthy redirect_from stub returns 301 and should pass. The scheduled run matters because external links rot on their own timeline — a host that was fine at merge can vanish three months later, and the Monday sweep finds it.

You’ll know it worked when a PR that introduces a typo’d link gets a red check listing the exact bad URL, and a clean PR stays green.

The part where it broke

Here’s the one that cost a real afternoon, and it had nothing to do with renamed posts.

I deployed to a project site — username.github.io/myrepo/ — and every link 404’d. Home, posts, CSS, all of it. The site built clean locally. It built clean in CI. It served a wall of 404s in production.

The cause: baseurl was "". On a root domain that’s correct. On a project page the whole site lives under /myrepo/, so a link written as /about/ resolves to username.github.io/about/ — which doesn’t exist — instead of username.github.io/myrepo/about/. Every absolute internal path was off by the repo name.

Two fixes, and you need both:

Set the prefix in _config.yml:
```
baseurl: "/myrepo"
```

Stop hard-coding leading-slash paths in templates. Run them through a filter that prepends baseurl:

<a href="{{ '/about/' | relative_url }}">About</a>

not

<a href="/about/">About</a>   <!-- ignores baseurl, 404s on project pages -->

The trap is that jekyll serve defaults to serving at the root locally, so a hard-coded /about/ works on your laptop and breaks only in production. To surface it early, don’t reach for --baseurl '' — that hides the bug. Serve with the real baseurl so local matches prod:

bundle exec jekyll serve   # honors baseurl from _config.yml

If links work locally only when you delete baseurl, you have hard-coded paths waiting to 404 the moment you deploy under a subpath.

The honest accounting

Permalinks, redirects, a 404 map, a grep pre-check, and one CI job. The config is maybe twenty lines total and it’s mostly copy-paste.

What it buys you isn’t speed — it’s that editing your site stops being dangerous. Rename a post, the old link redirects. Delete a page, the 404 hands the reader a map. Ship a typo’d link, CI catches it before a human does. The grep check is the cheapest of the lot: no build, no network, only the question “does this file exist,” answered in a second.

Rename freely. Redirect the old paths. Let the robot find the holes.

Layout	`article`
Collection	`hacks`
Path	`_hacks/kill-dead-links-jekyll-permalinks.md`
URL	`/hacks/kill-dead-links-jekyll-permalinks/`
Date	`2025-08-31`

Settings

Search

Appearance

About

Page Location

Source Code

Page Info

Theme Skin

SVG Backgrounds

Layer Opacity

Table of Contents