Peas, Carrots, Green Beans, and Gray Matter

Combating Link Rot in a Free E-Book Directory

Running a directory of free programming e-books isn’t as easy as one might think. It’s a lot of work, especially with regards to handling disappearing resources, due to link rot.

Since I don’t host any book files myself, and only link to legal sources, usually on the author’s or publisher’s website, that meant, in the past, coming up with an effective means of easily knowing when a link went dead.

I have previously used the mShots WordPress plugin to help with this, using the resulting dynamic screenshot to tell if the page was still there. A page-by-page scroll through my site would reveal the entries that needed to be updated or removed.

While that made initial posting much faster (not having to take the time to make and upload my own screenshots), the updating process when a link goes dead, is still tedious and time consuming.

Then I started running Xenu Link Sleuth on my site’s pages, to automate revealing all the dead links, which was much faster than flipping page by page and looking at the screenshots, but did nothing for shortening the time or work required to fix the broken links.

Fixing a broken link involves the following:

  1. Visiting the original link to verify that it is in fact broken.
  2. If it is broken, looking around on the original site to see if the resource has just been moved to another part of the website.
  3. If it has just been moved, updating the link in the directory entry.
  4. If I can’t find the content on the original site, trying to look for a copy of it on archive.org.
  5. Creating a short link that points to the copy on archive.org (if I find one)
  6. Changing the link in the directory entry to use the newly created short link.
  7. Then, finally, making a screenshot of the page, using the copy on archive.org, and replacing the dynamic mShot image.

Multiply that X30 for the most recent batch that needs updating. 🙁

I’ve decided that, going forward, all books listed in my directory will now come with 2 links:

  1. A link to the original resource on the author’s or publisher’s website
  2. A 2nd mirror link to the matching copy on archive.org.

In the case of downloadable files, I will go the extra mile to ensure that archive.org has saved a copy of the files, too, and not just the page that links to it.

Then I’ll just use a static screenshot of the original page, as it exists at the time of listing, and remove the mShots plugin from my site.

I’ll just have to convince myself that I shouldn’t be such a perfectionist, I should not care if the screenshots match whatever redesign the original site has applied to the page since the time of listing. And not care if the original link goes dead, since I have already done the work to provide a mirror.

Yes, it will involve a little more work for each entry at the time of listing, and a lot of work for updating all the books already listed, but then I’ll never have to check or mess with the entries again, unless someone reports the original link as pointing to questionable content (such as malware, scams, porn, etc.).

This also means that, going forward, I can spend more time expanding the directory, adding more books, rather than wasting my time maintaining the existing entries against link rot.

Leave a Reply