How to check for broken links in markdown files
Having blog articles up >10 years needs some kind of tool to check for dead links.
Having googled a bit I didn’t find anything convincing. So I just created a very dirty solution which did the job for me.
You start it with
python3 link_checker.py path/to/md/files/ http://mysite.com
and it iterates over all .md
files in path/to/md/files
for links and images in your articles, sends a HTTP HEAD
request and prints everything which does not look right
Some words of caution:
This is just a 80% solution. It will give you some false negatives:
- it does regex to find the links. It finds both markdown styled links and
a href=
styled links - it sends a basic user-agent, but some sites such as google don’t allow crawling so you’ll see
405 Method not allowed
Screw that, I want to use it anyway
Here’s the script to download. And here’s how it looks (it even put the ✔ in green and the x in red) (if you use Hexo you can exactly call the script like that):
$ ./link_checker.py source http://localhost:4000
How-to-set-up-raspberry-pi-headless-with-ssh-and-wifi.md ✔
Tagsystems-performance-tests.md x
-------------------------------
http://pastie.org/5480706 Got exception timed out
http://pastie.org/5480722 Got exception timed out
http://www.webmasterworld.com/forum23/3557.htm Got exception HTTP Error 403: Forbidden
How-to-attach-a-file-to-google-spreadsheet.md ✔
Django-Serve-big-files-via-fcgid.md ✔
Python-Print-list-of-dicts-as-ascii-table.md ✔
Tags-Database-schemas.md ✔
Tags-with-MySQL-fulltext.md ✔
How-to-reset-Jambox-when-bluetooth-completely-stopped-working.md ✔
Comments