Link Rot – The Keegan Blog

Post in Word Format

Those who run websites have two equally important responsibilities. First, Content Creation and second Content Maintenance, making certain that the content looks and functions the way it should, and that means dealing with a physical quantity called entropy, which measures the degree of disorder in a system.¹ Entropy increases with the passage of time. So, at some point, any website will exhibit signs of disorder and dysfunction. Maintenance should be a continuous process.

The most common ways entropy manifests itself in websites are Dead Links and Link Rot.² These terms describe the same thing in different ways. Links stop working for any one or a combination of four reasons.

Websites go down
1. Because of server failure or
2. Non-payment As a result, links to those sites die.
Sites and pages are moved, and no one puts a redirect into the .htaccess file either out of neglect or because their content management system does not allow them to do so, and those links die.
Just as often syntax errors occur during creation, editing, and updating. For example, updating a site to responsive design, HTML 5, or routine maintenance such as checking for and replacing dead links.
Weird stuff happens for no apparent reason.

Experience demonstrates the fourth reason happens most often. The second most common reason is that URLs change.³ Syntax errors, errors in coding, are third in frequency and are usually internal. The least frequent is neglecting to write a redirect into the .htaccess file.

The need for redirects suggests major changes in website architecture such as changing servers, adding or deleting directories, or changing content management systems. For example, some years ago I was hosting two WordPress instances as part of jgkeegan.com, but they required constant updating and I was unable to keep them up-to-date; so, my service provider would periodically shut them down. As a result, I purchased a new domain, keegan.blog, and moved both the Keegan Blog and the Keegan Podcast to WordPress.com. While the content of both is much easier to maintain, and it is much easier to post on the go; weird stuff does happen. Recently, I was routinely checking the blog for link rot, and discovered numerous errors in one of my older posts Intelligent Design is a Social Theory⁴ Opening the editor to start correction, I found that in both editing views only a third of the content was present. However, all of the content was present when viewing the post in the browser. That did not make sense. Not wanting to risk the loss of two-thirds of my content, I copied the post from the browser and coded it again making corrections as I went, which solved the
problem.

Checking for link rot and other Content Maintenance should be continuous, but stuff happens and, at the very least, routine checks should happen about every three months. Dead links are the most common maintenance issue on websites. Each outgoing link on a website should be physically inspected and replaced if it does not function. On large sites, anything over 250 pages, some type of automatic link checker should be used. I was reluctant to use an automatic link checker because there is no substitute for physical inspection. However, and automatic link checker does give an author a place to start. Recently, I searched for an automatic link checker and found Free Broken Link Checker–Online Dead Link Checking Tool.⁵ It has two features which are useful. It lists the broken URLs in a table; providing a link to the originating page, and it allows the author to view the code. It does, however, have a major drawback. The checker does not recognize changes immediately. That is, if a check is run on a domain and dead links are found and changed, and the checker is run immediately it will return the same information because not enough time has passed for the changes made to be recognized.

Once a broken link is found, what happens next? If the link is important enough the link text should be searched for, and if the same information is found the link should be replaced. If the same information is not found, then the URL should be put into the Wayback Machine at the Internet Archive, and the most recent snapshot of the information should be used.⁶ There are instances, however, when the Wayback Machine has not archived a page or site. In that case, and if the page is a list of links, the dead link should be removed and replaced if possible. If the dead link is in a footnote, which is part of an article or academic paper and cannot be found it should not be removed that is why the access date is given in the footnote.

Content Maintenance is as important to a website as Content Creation, and a functioning server. Content Maintenance should be a continuous process for a website to maintain its usefulness. Entropy increases with time.

Notes

¹ Stephen W. Hawking, A Brief History of Time (New York: Bantam, 1988), 102.

² PCMag Encyclopedia, s. v. Dead Link accessed July 24, 2021, https://www.pcmag.com/encyclopedia/term/dead-link. A hyperlink on a website that points to a Web page that has been deleted or moved. Also called an orphan link, it may also be a temporary condition if the Web server is down; PCMag Encyclopedia, s. v. Link Rot accessed July 24, 2021, https://www.pcmag.com/encyclopedia/term/link-rot. Invalid hyperlinks on the Web. The more years go by, the more link rot because pages are moved to new locations or deleted.

³ PCMag Encyclopedia, s. v. URL accessed July 24, 2021, https://www.pcmag.com/encyclopedia/term/url. (Uniform Resource Locator) The address that defines the route to a file on an Internet server (Web server, mail server, etc.). URLs are typed into a Web browser to access Web pages and files, and URLs are embedded within the pages themselves as links….

⁴ John Keegan, Intelligent Design is a Social Theory, The Keegan Blog, accessed August 3, 2021, https://keegan.blog/2005/10/30/intelligent-design-is-a-social-theory/.

⁵ Free Broken Link Checker–Online Dead Link Checking Tool, accessed July 24, 2021, https://www.brokenlinkcheck.com/.

⁶ Wayback Machine, Internet Archive, accessed July 24, 2021, https://archive.org/web/. The Wayback Machine provides access to snapshots of billions of websites that exist and have existed on the Internet.

The Keegan Blog

The Keegan Blog | The Blog of jgkeegan.com

Tag: Link Rot

Running Websites and The Second Law of Thermodynamics

Notes