A More Useful 404

by Dean Frickey

56 Reader Comments

Back to the Article
  1. Fine, I’m not a professional; I’m not even old enough to be classed as one. But personally, and from my point of view, can I congratulate the author on another great ALA article. It appeals to my more practical mind, and even made me update my 404 page.
    In response to everyone else’s comments, I would like to add my own thoughts: firstly, that I rewrote this in PHP, hence reducing the security concerns I think. (If someone more experienced would like to comment on this, please do, I love being proved wrong.). Secondly I think that automated email do have their advantages – having received 4 about one link motivated me to do something about it; logs are very good for statistics but don’t give me the imperative to do something (stress on the “me” there).
    And if James is following these comments, it would be interesting to know what harm emails cause.

    Copy & paste the code below to embed this comment.
  2. If a spider follows a link to a page that doesn’t exist, then the e-mail message from that will allow me to correct the link and the 404 goes away.

    Except that Slurp goes around deliberately making up URLs that it expects not to exist, so that it can check the site is correctly sending a 404 for non-existent pages (so it knows it can assume that a 200 page really is A-OK). You don’t want to be notified of every instance of this. I’m sure there is something in the user-agent string that you look for and use some sort of trickery to filter those out.

    Copy & paste the code below to embed this comment.
  3. A couple of reader’s have written and are concerned the spiders and bots could result in a large number of e-mails being sent.  But spiders and bots that are guessing at URLs will not generate e-mails because they are not following bad links and therefore will (probably) not have an HTTP_REFERER.  But as I mentioned, HTTP_REFERER can be faked, so I’m not going to say for certain that this is always the case with all spiders or bots.  However, I have been using the ideas presented here for the past few years and have yet to experience any problems with spiders or bots accessing the site.

    Copy & paste the code below to embed this comment.
  4. I have taken this stuff into consideration when doing redesigns for a number of sites. Instead of sending emails, I created a database to capture that data and then allowed the client to provide a correct URL for common issue pages and turn it into a redirect page. It also allows the user to see a list grouped by common pages and know how often it comes up.

    Much more useful than an email every time something comes up. Also more scalable than the method in the article.

    Copy & paste the code below to embed this comment.
  5. Just one thing.

    If a user is reading certain page from your site, and manually types a new URL but gets a 404 error, wouldn’t that count as “a bad link on your site”?

    For example, someone that’s on www.domain.com and types in the address bar www.domain.com/contact.

    Wouldn’t that send a HTTP_REFERER with your domain? Then you would get an email saying there’s a bad link on the index page when there really isn’t.

    Maybe I’m getting confused here, but I wanted to ask to make sure.

    Copy & paste the code below to embed this comment.
  6. i was looking some place where I could get complete info about 404. Thanks to blog owner for writing such a good post

    Copy & paste the code below to embed this comment.
  7. Apache is more than happy to use a CGI as your ErrorDocument:

    ErrorDocument 404 /cgi-bin/404.pl

    If you don’t want that cgi-bin in the URL, just go for

    Alias /404-not-found /cgi-bin/404.pl
    ErrorDocument 404 /404-not-found

    Copy & paste the code below to embed this comment.
  8. Kevin Selles:  It’s a good question, so thanks for asking it.  The referer header is only sent by the browser when a link is clicked so, no, manually entering an incorrect URL will not generate an e-mail, regardless of the page you’re currently viewing.

    Copy & paste the code below to embed this comment.
  9. Dick Davies: You are correct in that Apache could be configured to call the Perl script directly. But when doing this the Perl script would be responsible for building the complete 404 page with all of the elements and styles necessary to have the look and feel of the website.  And it will be more difficult to access the styles and shared elements which would be located somewhere under document root.

    By executing the Perl script from within the .shtml page, the design of the 404 page (i.e. headers, footers, navigation, etc.) is easy. If you have a template for your site, the 404 page is simply a template file with the line,
    <!—#include virtual=”/cgi-bin/404.pl”—>
    inserted at the point where the content needs to appear.

    Copy & paste the code below to embed this comment.
  10. Dean Frickey: Many thanks for the explanation.

    Copy & paste the code below to embed this comment.
  11. Thanks for giving me change to join in this discuss website. i hope i can get lots of info here..

    Copy & paste the code below to embed this comment.
  12. I thought mine was useful, but this goes to another level. Nice.

    Copy & paste the code below to embed this comment.
  13. I’ve created a WordPress Plugin that mimicks the behavior described in this article, with configurable options in the admin area. Enjoy!

    Download: “Useful 404s for WordPress”:http://skullbit.com/wordpress-plugin/useful-404s/

    Copy & paste the code below to embed this comment.
  14. Hi, I tried this with some of my sites and this get better serp’s in only five days, I don’t know if this is the result of make my 404 page more friendly, only know that I only made this change in my site. Also tried plugin for wp from Marcus and work fine. Thanks.

    Copy & paste the code below to embed this comment.
  15. Great article! I really love the image also!

    Copy & paste the code below to embed this comment.
  16. Very nice, Marcus.

    Copy & paste the code below to embed this comment.
  17. Several readers have commented that I need to be sure to send the correct HTTP header in the response generated by my script.  I’ll admit I hadn’t given this any thought so I started looking into it.  I used the Live HTTP Headers FireFox extension to watch the HTTP traffic.  When I select a link or type in a URL to a page that doesn’t exist, I receive “HTTP/1.x 404 Not Found.” So it seems that Apache is sending the proper heading.  If anyone has more to add to this please speak up.

    Copy & paste the code below to embed this comment.
  18. You say you have a text file of search engine referers – could you make this available? It would save a bit of time compiling one…

    Copy & paste the code below to embed this comment.
  19. Thanks Dean for your article! It already helped me providing better user experience on my websites.

    In the meanwhile I’ve created a Ruby script which mimics your Perl script.
    You can downlaod from the “‘More useful 404 error page’”:http://github.com/perfectionlabs/more-useful-404-error-page/ project on GitHub.

    Copy & paste the code below to embed this comment.
  20. If you are getting a 404 from search engine referrals, that’s because you’ve forgot to setup a 301 redirect to the new URL. Otherwise you should have a custom 410 page saying that the resource was removed for good. See “RFC 2616”:www.w3.org/Protocols/rfc2616/rfc2616.html for HTTP 1.1 specification.

    Copy & paste the code below to embed this comment.
  21. If you are getting a 404 from search engine referrals, that’s because you’ve forgot to setup a 301 redirect to the new URL. Otherwise you should have a custom 410 page saying that the resource was removed for good. See “RFC 2616”:www.w3.org/Protocols/rfc2616/rfc2616.html for HTTP 1.1 specification.

    Copy & paste the code below to embed this comment.
  22. Jeremy Flint,

    We use google 404 with our University website in combination with analytics so we know when pages are broken.  I would like to combine our current setup with some of the things that were discussed in this article.  Google 404 works really well for us…here is an example: http://www.uwgb.edu/asdf

    Copy & paste the code below to embed this comment.
  23. I’ve been looking for some good code for this for a while, and I’m especially impressed how this takes all of the error scenarios into account in order to address the issue.  What I didn’t see was an easy download link for the PERL script, am I just supposed to copy and paste all the code snippets on the page together?

    Also, I’ve noticed when you get a 404 from sites like Google.com, it doesn’t display the actual 404 page url, but displays the 404 page ON the URL you typed.

    Example:

    http://www.google.com/oops404

    Even though there is no oops404 page, it looks like there is.  If someone wanted to integrate that functionality into this code, is that doable, and how would you go about it?

    Copy & paste the code below to embed this comment.
  24. Hi this is a great article, most of the stuff I have converted to PHP.

    However I can’t use $_SERVER[‘SERVER_NAME’] . $_SERVER[‘REQUEST_URI’] to determine the page the user was trying to get. I just get the .php error page that the user was forwarded onto.

    Anyone have the same problem? how did you solve it?

    Thanks
    Steven

     

    Copy & paste the code below to embed this comment.
  25. What is actually quite fun to play with is using the php similar_text() function (or anything similar) to match the desired url against a list of valid urls for a site and redirect to the closest match.  Using certain limits as to how close the match must be I use it to fix problems when a users miss types one letter in a longish url, or do a bad copy and pate job where they add or drop a letter.

    Copy & paste the code below to embed this comment.
  26. How could there be a more useful 404 when you already published… “The Perfect 404”:
    http://www.alistapart.com/articles/perfect404
    by Ian Lloyd January 16, 2004
    Just kidding, good stuff.

    Copy & paste the code below to embed this comment.
  27. I didn’t take the time to read through all of the comments so I don’t know if this was already suggested. For my 404 pages I utilize my sites search functionality. IF a URL is mistyped I have a little message that says “We couldn’t find the page you requested. Did you mean one of these:” Then I display the search results form my site search.

    Copy & paste the code below to embed this comment.
  28. In reply to the Google 404 widget post, another similar idea – the “Linkgraph”:http://linkgraph.net/ widget, was released in December 2008. It’s a tool like the “Google 404 widget”:http://googlewebmastercentral.blogspot.com/2008/08/make-your-404-pages-more-useful.html only the Linkgraph widget uses a database of all previous URLs of a site’s pages to get the right URL when you click a broken link. Provided you got to the page through a broken link of course.

    Copy & paste the code below to embed this comment.
  29. Hallo, I’ve tried it, and I have to say it works great. Thanks.

    Copy & paste the code below to embed this comment.
  30. I feel that 404 pages are useless, but mainly for SEO. What I would do and what we suggest doing is putting a php 301 redirect above the header of your 404 page.  Where this does not give a great user experience (unless you rework the page its directing to, to signal that the page was not found) it does 301 any 404 page that comes up before the search engines can see that its a 404, thus preserving a % of the link juice and pushing it on to the page it redirects to.

    Copy & paste the code below to embed this comment.
  31. I agree with bill that 404s are bad for SEO – however 301-ing all 404s to a single other pages isn’t quite optimal either. The optimal (if sometimes unattainable) experience would be for any old page/URL to get 301 redirected to the new/working page that is most applicable to the old page.  This is really common when our clients move to a new CMS for example.  The old pages (the content of them) are still on the new site, but all at new URLs.  We’ve actually done this so many times we built a tool called www.errorlytics.com that does exactly this quickly and easily for sites running PHP, JS and Rails…and has Wordpress and Drupal plugins/modules.  The idea of the tool is to 1) make you aware of the often ignored 404 and 2) make it so you can get rid of them via 301 redirects thus preserving SEO and the end user experience.

    The key edge that Errorlytics offers over tools like, for example, linkgraph is that it allows the webmaster not only to ultimately get the user who has requested a bad/dead URL to get to a page that has some content on it – but Errorlytics does this via an SEO friendly 301 so even the spiders are happy.  Using a frameset and a meta refresh is far from optimal on the SEO side.

    Copy & paste the code below to embed this comment.
  32. Steven, I just tried to mimic .htaccess and php, and for me $_SERVER[“SCRIPT_URI”] returned URL of the page user asked for. You can temporary send your 404 to a .php page with

    <?php phpinfo (); ?>

    then go to any not-existing webpage in your website, 404 page will show PHP info page, just examine the variables it shows and I’m sure you will find webpage you entered, not only name of 404 script. Just do not forget to restore 404 to your original script after fixing the problem.

    Copy & paste the code below to embed this comment.
  33. I have now implemented a similar system to this in PHP. Thanks for the interesting article gave me some great ideas.

    Copy & paste the code below to embed this comment.
  34. So many requirements for error page 404. Interesting visits statistics of page 404 in percents. I’d like to see examples of 404 pages. For self I choose this functionality of 404 page http://www.kontain.com/bickov/entries/36756/page-404/

    Copy & paste the code below to embed this comment.
  35. I have been developing web sites for a number of years now and I am always surprised at the lack of attention paid to 404 pages by both developers and designers. I have always harvested 404 data and was very pleased to find this article and see that there are web developers and designer out there that understand how the “404” should be used to teach us about issues our end users are encountering.  This technique has saved my shame and insult because I can proactively find issues.

    Copy & paste the code below to embed this comment.
  36. How can you translate the plugin to another language?

    Thank you.

    Copy & paste the code below to embed this comment.