dead link

Having blog articles up >10 years needs some kind of tool to check for dead links.

Having googled a bit I didn’t find anything convincing. So I just created a very dirty solution which did the job for me.

You start it with

python3 path/to/md/files/

and it iterates over all .md files in path/to/md/files for links and images in your articles, sends a HTTP HEAD request and prints everything which does not look right

Some words of caution:

This is just a 80% solution. It will give you some false negatives:

  • it does regex to find the links. It finds both markdown styled links and a href= styled links
  • it sends a basic user-agent, but some sites such as google don’t allow crawling so you’ll see 405 Method not allowed

Screw that, I want to use it anyway

Here’s the script to download. And here’s how it looks (it even put the in green and the x in red) (if you use Hexo you can exactly call the script like that):

$ ./ source http://localhost:4000 ‎✔ x
------------------------------- Got exception timed out Got exception timed out Got exception HTTP Error 403: Forbidden ‎✔ ‎✔ ‎✔ ‎✔ ‎✔ ‎✔