A More Useful 404
Issue № 272

A More Useful 404

Encountering 404 errors is not new. Often, developers provide custom 404 pages to make the experience a little less frustrating. However, for a custom 404 page to be truly useful, it should not only provide relevant information to the user, but should also provide immediate feedback to the developer so that, when possible, the problem can be fixed.

Article Continues Below

To accomplish this, I developed a custom 404 page that can be adapted to the look and feel of the website it’s used on and uses server-side includes (SSI) to execute a Perl script that determines the cause of the 404 error and takes appropriate action.

Overall design#section2

To provide useful and specific information to the user, it is necessary to define the possible causes of a 404 error. Here are four possible causes:

  1. The user mistyped the URL or followed an out-of-date bookmark. These are grouped together because we’ll see that it’s not possible to distinguish one from the other.
  2. The user encountered a 404 error because of a broken link within my site.
  3. The 404 error results from a broken link returned by a search engine.
  4. The 404 error was caused by a broken link on another website, but not a search engine.

In each of these cases, the 404 provides information about the specific cause of the error. If the broken link is either on my website or someone else’s website, but not returned via a search engine, the Perl script sends me, the developer, an e-mail about the broken link, including the URL the link points to and the page the user was trying to reach.

Custom 404 page#section3

SSI allow you to include common snippets of static HTML, such as a header and footer, throughout a site. SSI pages, which typically have an .shtml extension, are processed by the server before the pages are sent to the browser.

When an SSI directive such as this one:

<!--#include virtual="/inc/header.html" -->

is encountered in the .shtml file, the server replaces that line with the contents of the file specified.

However, in addition to this rather simple function, SSI can execute programs such as Perl scripts. In this case, the output generated by the Perl script is sent to the browser.

Since I wanted my custom 404 page to provide specific information to the user as well as send information to me, my custom 404 page is an .shtml page in which I use SSI to execute a Perl script that does all the work. For my site, the SSI directive looks like this.

<!--#include virtual="/cgi-bin/404.pl" -->

The rest of the 404 page contains code to give the page the look and feel of the website that contains it.

Enabling custom 404 pages#section4

The web server needs to be configured to use SSI. This can be done by either using an .htaccess file, or modifying the Apache httpd.conf file.

First, to have Apache serve up my specific 404 page when a 404 error is encountered, I add the ErrorDocument directive to the httpd.conf file, or the .htaccess file. It looks like this.

ErrorDocument 404 /errorpages/404.shtml

Second, to tell Apache to execute CGI scripts, I need to make sure the httpd.conf file has the ExecCGI parameter added to the Options directive. Or I can just add: Options +ExecCGI to the .htaccess  file.

Perl script#section5

The Perl script does the processing to determine the appropriate action. To identify the source of the 404 error, the Perl script accesses the HTTP_REFERER environmental variable.  HTTP_REFERER contains the URL of the page that the user just came from. I realize that there are no guarantees that this is accurate because it can be faked, but this isn’t really a concern for this application.

In general, the Perl code performs the following steps:

  1. Check HTTP_REFERER to determine the source of the 404 error.
  2. Display the appropriate message to the user.
  3. Send me an e-mail message, if needed for the particular error.

Case 1: Mistyped URL or out-of-date bookmark#section6

In the case of a mistyped URL or an out-of-date bookmark, the HTTP_REFERER will be blank. In Perl, I check for this using the following code:

if (length($ENV{'HTTP_REFERER'}) == 0)

The Perl script displays a message in the custom 404 page that tells the user what the problem is. In the messages displayed to the user, as well as any e-mail messages sent to me, I provide the URL of the requested page using this code:

my $requested = "http://$ENV{'SERVER_NAME'}$ENV{'REQUEST_URI'}";

Case 2: Broken link on my website#section7

When HTTP_REFERER is not blank, I check to see if it refers to my site, somebody else’s site, or a search engine. If it contains my domain name, then I know the user followed a link from one of my pages. I use the following Perl snippet to check for this:

if ((index($ENV{'HTTP_REFERER'}, $ENV{'SERVER_NAME'}) >= 0))

The index function will return the position of SERVER_NAME in the HTTP_REFERER string. If it’s there, index  will be a number greater than zero and I’ll know that the user was on a page on my site.

In this case, I present a message to the user stating that I have a broken link on my page. However, rather than ask the user to send me an e-mail telling me this, the Perl script sends me an e-mail containing all of the necessary information. At the same time, I let the user know that an e-mail has just been sent and the broken link will be corrected shortly.

In the e-mail message, I set the subject of the message to clearly identify that there is a broken link on my site and provide the domain name using $ENV{'SERVER_NAME'}. This allows me to use this script on multiple sites while simplifying the sorting of any incoming messages. The body of the e-mail tells me the URL of the page the user was on, as well as the URL of the requested page.

Case 3: Broken link from a search engine#section8

To determine if the user came from a search engine results page, I check HTTP_REFERER against a list of search engine URLs. This list is stored in a simple text file that the Perl script reads. By using an external file containing a list of URLs, I can update the list at any time and not have to modify the Perl.

Here are the Perl snippets for this case:

my $SEARCHENGINE = "false";
open(FILE, "searchengines.txt") or die "cannot open the file";
while () {
  chomp;
  if (index($referrer, $_) >= 0) {
    $SEARCHENGINE = "true";
  }
}

then,

if ($SEARCHENGINE eq "true")

In this case, I let the user know that the search engine returned an old link. Since there really isn’t anything I can do about it, I don’t need an e-mail message, however, I may want one just so I know about the problem.

Case 4: Broken link on somebody else’s website#section9

If the 404 was not the result of any of the three previous situations, then I know it was caused by a broken link on somebody else’s page. So again, the Perl script displays the appropriate information to the user and sends me an e-mail message. I can then go to the page with the broken link and if the page owner has provided contact information, I can notify them of the problem.

Other than Apache#section10

There is no reason this won’t work on web servers running Microsoft IIS. The server needs to be configured to allow scripts to be executed, and of course Perl needs to be available.

Finally…#section11

Implementing this custom 404 page improves the usability of my site by helping the user, and it keeps me informed of broken links. For clarity, the table below shows the four cases I discussed, along with the message displayed to the user and any e-mail message that is sent.

Table 1: The four cases discussed
   

   

   

 

 

 

 

   

   

   

 

 

 

 

   

Case Message to User E-mail message
Case Message to User E-mail message

1. Mistyped URL or out-of-date bookmark

Sorry, but the page you were trying to get to, http://www.mydomain.com/
no-such-page.shtml, does not exist.

It looks like this was the result of either a mistyped address or an out-of-date bookmark in your web browser.

You may want to try searching this site or using our site map to find what you were looking for.

None

2. Broken link on one of my pages

Sorry, but the page you were trying to get to, http://www.mydomain.com/
no-such-page.shtml, does not exist.

Apparently, we have a broken link on our page. An e-mail has just been sent to the person who can fix this and it should be corrected shortly. No further action is required on your part.

From: www.mydomain.com 404 script

Subject: Broken link on my site, www.mydomain.com.

Message: BROKEN LINK ON MY SITE

There appears to be a broken link on my page, http://www.mydomain.com/
badlink.shtml. Someone was trying to get to http://www. mydomain.gov/
no-such-page.shtml from that page.  Why don’t you take a look at it and see what’s wrong?

3. Broken link on a search engine results page

Sorry, but the page you were trying to get to, http://www.mydomain.com/no-such-page.shtml, does not exist.

It looks like the search engine has returned a link to an old page. These old links should eventually be removed from their indexes but since these are automatically generated there is no one to contact to try to correct the problem.

You may want to try searching this site or using our site map to find what you were looking for.

Optional.  An e-mail message is not needed because there isn’t much I can do about the broken link but I may go ahead and have the script send me one just so I know about it.

4. Broken link on somebody else’s page

Sorry, but the page you were trying to get to, http://www.mydomain.com/
no-such-page.shtml, does not exist.

Apparently, there is a broken link on the page you just came from. We have been notified and will attempt to contact the owner of that page and let them know about it.

You may want to try searching this site or using our site map to find what you were looking for.

From:www.mydomain.com 404 script

Subject: Broken link on somebody else’s site.

Message: BROKEN LINK ON SOMEBODY ELSE’S SITE

There appears to be a broken link on the page, http://www.somedomain.com/
badlink.shtml.  Someone was trying to get to http://www.mydomain.com/
no-such-page.shtml from that page.  Why don’t you take a look at it and see if you can contact the page owner and let them know about it?

About the Author

Dean Frickey

Dean Frickey has been involved in the internet since 1995, teaching classes at the local university. His interests are in designing to web standards and web usability. He lives and works in Idaho Falls, Idaho.

56 Reader Comments

  1. I agree with bill that 404s are bad for SEO – however 301-ing all 404s to a single other pages isn’t quite optimal either. The optimal (if sometimes unattainable) experience would be for any old page/URL to get 301 redirected to the new/working page that is most applicable to the old page. This is really common when our clients move to a new CMS for example. The old pages (the content of them) are still on the new site, but all at new URLs. We’ve actually done this so many times we built a tool called http://www.errorlytics.com that does exactly this quickly and easily for sites running PHP, JS and Rails…and has WordPress and Drupal plugins/modules. The idea of the tool is to 1) make you aware of the often ignored 404 and 2) make it so you can get rid of them via 301 redirects thus preserving SEO and the end user experience.

    The key edge that Errorlytics offers over tools like, for example, linkgraph is that it allows the webmaster not only to ultimately get the user who has requested a bad/dead URL to get to a page that has some content on it – but Errorlytics does this via an SEO friendly 301 so even the spiders are happy. Using a frameset and a meta refresh is far from optimal on the SEO side.

  2. Steven, I just tried to mimic .htaccess and php, and for me $_SERVER[“SCRIPT_URI”] returned URL of the page user asked for. You can temporary send your 404 to a .php page with

    then go to any not-existing webpage in your website, 404 page will show PHP info page, just examine the variables it shows and I’m sure you will find webpage you entered, not only name of 404 script. Just do not forget to restore 404 to your original script after fixing the problem.

  3. I have been developing web sites for a number of years now and I am always surprised at the lack of attention paid to 404 pages by both developers and designers. I have always harvested 404 data and was very pleased to find this article and see that there are web developers and designer out there that understand how the “404” should be used to teach us about issues our end users are encountering. This technique has saved my shame and insult because I can proactively find issues.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

I am a creative.

A List Apart founder and web design OG Zeldman ponders the moments of inspiration, the hours of plodding, and the ultimate mystery at the heart of a creative career.
Career