How to check for broken links in markdown files
Having blog articles up >10 years needs some kind of tool to check for dead links.
Having googled a bit I didn’t find anything convincing. So I just created a very dirty solution which did the job for me.
You start it with
python3 link_checker.py path/to/md/files/ http://mysite.com
and it iterates over all
.md files in
path/to/md/files for links and images in your articles, sends a
HTTP HEAD request and prints everything which does not look right
This is just a 80% solution. It will give you some false negatives:
- it does regex to find the links. It finds both markdown styled links and
a href=styled links
- it sends a basic user-agent, but some sites such as google don’t allow crawling so you’ll see
405 Method not allowed
Here’s the script to download. And here’s how it looks (it even put the ✔ in green and the x in red) (if you use Hexo you can exactly call the script like that):
$ ./link_checker.py source http://localhost:4000 How-to-set-up-raspberry-pi-headless-with-ssh-and-wifi.md ✔ Tagsystems-performance-tests.md x ------------------------------- http://pastie.org/5480706 Got exception timed out http://pastie.org/5480722 Got exception timed out http://www.webmasterworld.com/forum23/3557.htm Got exception HTTP Error 403: Forbidden How-to-attach-a-file-to-google-spreadsheet.md ✔ Django-Serve-big-files-via-fcgid.md ✔ Python-Print-list-of-dicts-as-ascii-table.md ✔ Tags-Database-schemas.md ✔ Tags-with-MySQL-fulltext.md ✔ How-to-reset-Jambox-when-bluetooth-completely-stopped-working.md ✔