A List Apart

Menu
A More Useful 404 Issue № 272

A More Useful 404

by Published in HTML, The Server Side · 56 Comments

Encountering 404 errors is not new. Often, developers provide custom 404 pages to make the experience a little less frustrating. However, for a custom 404 page to be truly useful, it should not only provide relevant information to the user, but should also provide immediate feedback to the developer so that, when possible, the problem can be fixed.

To accomplish this, I developed a custom 404 page that can be adapted to the look and feel of the website it’s used on and uses server-side includes (SSI) to execute a Perl script that determines the cause of the 404 error and takes appropriate action.

Overall design

To provide useful and specific information to the user, it is necessary to define the possible causes of a 404 error. Here are four possible causes:

  1. The user mistyped the URL or followed an out-of-date bookmark. These are grouped together because we’ll see that it’s not possible to distinguish one from the other.
  2. The user encountered a 404 error because of a broken link within my site.
  3. The 404 error results from a broken link returned by a search engine.
  4. The 404 error was caused by a broken link on another website, but not a search engine.

In each of these cases, the 404 provides information about the specific cause of the error. If the broken link is either on my website or someone else’s website, but not returned via a search engine, the Perl script sends me, the developer, an e-mail about the broken link, including the URL the link points to and the page the user was trying to reach.

Custom 404 page

SSI allow you to include common snippets of static HTML, such as a header and footer, throughout a site. SSI pages, which typically have an .shtml extension, are processed by the server before the pages are sent to the browser.

When an SSI directive such as this one:

<!--#include virtual="/inc/header.html" -->

is encountered in the .shtml file, the server replaces that line with the contents of the file specified.

However, in addition to this rather simple function, SSI can execute programs such as Perl scripts. In this case, the output generated by the Perl script is sent to the browser.

Since I wanted my custom 404 page to provide specific information to the user as well as send information to me, my custom 404 page is an .shtml page in which I use SSI to execute a Perl script that does all the work. For my site, the SSI directive looks like this.

<!--#include virtual="/cgi-bin/404.pl" -->

The rest of the 404 page contains code to give the page the look and feel of the website that contains it.

Enabling custom 404 pages

The web server needs to be configured to use SSI. This can be done by either using an .htaccess file, or modifying the Apache httpd.conf file.

First, to have Apache serve up my specific 404 page when a 404 error is encountered, I add the ErrorDocument directive to the httpd.conf file, or the .htaccess file. It looks like this.

ErrorDocument 404 /errorpages/404.shtml

Second, to tell Apache to execute CGI scripts, I need to make sure the httpd.conf file has the ExecCGI parameter added to the Options directive. Or I can just add: Options +ExecCGI to the .htaccess  file.

Perl script

The Perl script does the processing to determine the appropriate action. To identify the source of the 404 error, the Perl script accesses the HTTP_REFERER environmental variable.  HTTP_REFERER contains the URL of the page that the user just came from. I realize that there are no guarantees that this is accurate because it can be faked, but this isn’t really a concern for this application.

In general, the Perl code performs the following steps:

  1. Check HTTP_REFERER to determine the source of the 404 error.
  2. Display the appropriate message to the user.
  3. Send me an e-mail message, if needed for the particular error.

Case 1: Mistyped URL or out-of-date bookmark

In the case of a mistyped URL or an out-of-date bookmark, the HTTP_REFERER will be blank. In Perl, I check for this using the following code:

if (length($ENV{'HTTP_REFERER'}) == 0)

The Perl script displays a message in the custom 404 page that tells the user what the problem is. In the messages displayed to the user, as well as any e-mail messages sent to me, I provide the URL of the requested page using this code:

my $requested = "http://$ENV{'SERVER_NAME'}$ENV{'REQUEST_URI'}";

Case 2: Broken link on my website

When HTTP_REFERER is not blank, I check to see if it refers to my site, somebody else’s site, or a search engine. If it contains my domain name, then I know the user followed a link from one of my pages. I use the following Perl snippet to check for this:

if ((index($ENV{'HTTP_REFERER'}, $ENV{'SERVER_NAME'}) >= 0))

The index function will return the position of SERVER_NAME in the HTTP_REFERER string. If it’s there, index  will be a number greater than zero and I’ll know that the user was on a page on my site.

In this case, I present a message to the user stating that I have a broken link on my page. However, rather than ask the user to send me an e-mail telling me this, the Perl script sends me an e-mail containing all of the necessary information. At the same time, I let the user know that an e-mail has just been sent and the broken link will be corrected shortly.

In the e-mail message, I set the subject of the message to clearly identify that there is a broken link on my site and provide the domain name using $ENV{'SERVER_NAME'}. This allows me to use this script on multiple sites while simplifying the sorting of any incoming messages. The body of the e-mail tells me the URL of the page the user was on, as well as the URL of the requested page.

Case 3: Broken link from a search engine

To determine if the user came from a search engine results page, I check HTTP_REFERER against a list of search engine URLs. This list is stored in a simple text file that the Perl script reads. By using an external file containing a list of URLs, I can update the list at any time and not have to modify the Perl.

Here are the Perl snippets for this case:

my $SEARCHENGINE = "false";
open(FILE, "searchengines.txt") or die "cannot open the file";
while () {
  chomp;
  if (index($referrer, $_) >= 0) {
    $SEARCHENGINE = "true";
  }
}

then,

if ($SEARCHENGINE eq "true")

In this case, I let the user know that the search engine returned an old link. Since there really isn’t anything I can do about it, I don’t need an e-mail message, however, I may want one just so I know about the problem.

Case 4: Broken link on somebody else’s website

If the 404 was not the result of any of the three previous situations, then I know it was caused by a broken link on somebody else’s page. So again, the Perl script displays the appropriate information to the user and sends me an e-mail message. I can then go to the page with the broken link and if the page owner has provided contact information, I can notify them of the problem.

Other than Apache

There is no reason this won’t work on web servers running Microsoft IIS. The server needs to be configured to allow scripts to be executed, and of course Perl needs to be available.

Finally…

Implementing this custom 404 page improves the usability of my site by helping the user, and it keeps me informed of broken links. For clarity, the table below shows the four cases I discussed, along with the message displayed to the user and any e-mail message that is sent.

Table 1: The four cases discussed
                                                             
CaseMessage to UserE-mail message
CaseMessage to UserE-mail message

1. Mistyped URL or out-of-date bookmark

Sorry, but the page you were trying to get to, http://www.mydomain.com/
no-such-page.shtml, does not exist.

It looks like this was the result of either a mistyped address or an out-of-date bookmark in your web browser.

You may want to try searching this site or using our site map to find what you were looking for.

None

2. Broken link on one of my pages

Sorry, but the page you were trying to get to, http://www.mydomain.com/
no-such-page.shtml, does not exist.

Apparently, we have a broken link on our page. An e-mail has just been sent to the person who can fix this and it should be corrected shortly. No further action is required on your part.

From: www.mydomain.com 404 script

Subject: Broken link on my site, www.mydomain.com.

Message: BROKEN LINK ON MY SITE

There appears to be a broken link on my page, http://www.mydomain.com/
badlink.shtml. Someone was trying to get to http://www. mydomain.gov/
no-such-page.shtml from that page.  Why don’t you take a look at it and see what’s wrong?

3. Broken link on a search engine results page

Sorry, but the page you were trying to get to, http://www.mydomain.com/no-such-page.shtml, does not exist.

It looks like the search engine has returned a link to an old page. These old links should eventually be removed from their indexes but since these are automatically generated there is no one to contact to try to correct the problem.

You may want to try searching this site or using our site map to find what you were looking for.

Optional.  An e-mail message is not needed because there isn’t much I can do about the broken link but I may go ahead and have the script send me one just so I know about it.

4. Broken link on somebody else’s page

Sorry, but the page you were trying to get to, http://www.mydomain.com/
no-such-page.shtml, does not exist.

Apparently, there is a broken link on the page you just came from. We have been notified and will attempt to contact the owner of that page and let them know about it.

You may want to try searching this site or using our site map to find what you were looking for.

From:www.mydomain.com 404 script

Subject: Broken link on somebody else’s site.

Message: BROKEN LINK ON SOMEBODY ELSE’S SITE

There appears to be a broken link on the page, http://www.somedomain.com/
badlink.shtml.  Someone was trying to get to http://www.mydomain.com/
no-such-page.shtml from that page.  Why don’t you take a look at it and see if you can contact the page owner and let them know about it?

About the Author

56 Reader Comments

Load Comments