A More Useful 404

by Dean Frickey

56 Reader Comments

Back to the Article
  1. It’s true, a 404 page isn’t often something that is in the forefront of the designer/developer’s mind when building a site but it really is important. One error or bad link is often enough to make the user leave and never look back but if your 404 page soothes them and makes them feel like you’re sorry for any inconvenience and you want to help them find what they’re looking for then you stand a good chance of keeping the visitor.

    All good stuff!

    Copy & paste the code below to embed this comment.
  2. Hi Dean,

    of course you can do something about wrong links in search results. The best solution would be to redirect the user to the correct page, assuming that just the link has changed.

    Otherwise, you can use the Google Webmaster Tools / Yahoo! Site Explorer / or any of the other search engines webmaster tools to remove that page from their index and allow users to have a better experience on the web.

    Regards,
    Olaf


    Olaf Offick
    http://www.learn-skills.org

    Copy & paste the code below to embed this comment.
  3. Has anyone used the Google 404 code from Webmaster Tools?

    Its a javascript snippet you put in the body of the page and it will try to suggest other pages in your site (that are in the Google index) that match the bad URL the user typed.

    Just wondering how well it works.

    Copy & paste the code below to embed this comment.
  4. Sometimes it is also a good idea to redirect a 404 to the index-Page. Especially if you totally changed your site structure and you can’t redirect each old url to the new one, this might be an option. Otherwise you may lose a lot of link power.

    Copy & paste the code below to embed this comment.
  5. I’m sorry, but I just don’t subscribe to this notion at all.

    For one, the execution could easily result in lots of emails from any number of badly configured web spiders. I mean, we’ve all seen the number of 404s our sites get.

    It’s certainly not unique to get reports on the location of outdated links. But the article is just a taster of what is possible. It would be much more worthwhile to see an article on how to utilize something similar as part of a 500, with a complete debug/trace going to the developer. Perhaps thats the coder in me speaking out and has little place on ALA.

    Copy & paste the code below to embed this comment.
  6. I’m currently using Ruby on Rails as my default web application framework, and it makes it incredibly easy to handles these missing requests. Simply create a “catch-all” controller that will log what the user requested, the number of times that has been requested, and you can put in the logic for directing them to a proper page (say you’ve analyzed where multiple users are going, and you know what they are trying to get to).

    Copy & paste the code below to embed this comment.
  7. This is helpful since handling errors is often forgotten about during the rush to go live.

    If you are scripting this type of thing, it may be useful to log all 404 errors per session/IP address/IP range and choose some sort of threshold to terminate a session or temporarily ban the IP address.  The threshold level will depend on the sensitivity of data on the site and say whether a user is logged in.  If there are many ‘not founds’ in a short period of time, this can be an indicator of someone scanning the site.  But if you have opted in to something that scans in this way (perhaps a remote vulnerability assessment tool), you’ll need to exclude that from any filtering.  404 logging should also be correlated with server error logging (as Peter alludes to above).

    When using any data that can be modified by a user such as HTTP_REFERER or the REQUEST_URI be very careful about using it in scripts, writing it to your database, including it in an email or displaying it back to screen.  If you are not careful, these could lead to added vulnerabilities in the web site.

    The 404 page/script should also return a 404 ‘Not found’ HTTP status code.  Interestingly on ALA, the link to your (Dean Frickey’s) details:

    http://www.alistapart.com/authors/f/deanfrickey

    returns a ‘not found’ type of page, but the status code is ‘200 OK’ like other ‘not found; errors on ALA.  Reference:

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

    Copy & paste the code below to embed this comment.
  8. Interestingly on ALA, the link to your (Dean Frickey’s) details:

    Temporary CMS hiccup. Sorry about that. Dean’s bio is of course online and the link works.

    The status code is “˜200 OK’ like other “˜not found; errors on ALA

    Thanks for alerting us to the issue.

    Copy & paste the code below to embed this comment.
  9. This solution is a duplication of effort, more complex than it needs to be, and opens up a potential attack vector that could otherwise be closed.

    Web servers log the HTTP referer for every request, in addition to the user agent string, originating IP, etc. The same thing could be done with a script to pull out all 404s from the access log and analyze them the same way. If you want the script to e-mail you, it can be run in a cron job.

    Using the logs means not needing to run extra (interpreted) code for every 404 request to pull the same information from the environment that’s already available in the log. In addition you can turn of server-side includes, which removes a potential exploit vector for your server.

    The goal of informing the developer when users are seeing 404s is laudable; the method proposed here is inelegant.

    Copy & paste the code below to embed this comment.
  10. I think it’s at least important to send 404 headers.

    Copy & paste the code below to embed this comment.
  11. For the search-engine induced broken link, you can help / speed up the searchengine update by making a 301 redirect. either you create the missing page, only including the redirection inside, or if you are more savvy, use your .htaccess to do it( much cleaner).

    info here : http://www.webconfs.com/how-to-redirect-a-webpage.php

    Copy & paste the code below to embed this comment.
  12. Kevin, your comment regarding duplication of effort is absolutely correct, however it only deals with half of the problem.  This solution gives you the flexibility to handle the error message that appears to users, which is more important.

    Honestly, the e-mail part of this is likely unnecessary, as most basic server-based stats software will list these pages.  But the increased usability for the end user is stellar, and more sites should be implementing this thought process where possible.

    Copy & paste the code below to embed this comment.
  13. I agree with Chris. The idea here is not only to alert the webmaster (which certainly can be done in other ways) but also to provide better feedback to the user. In this light, I’d also be interested in experimenting with the Google 404 script that Jeremy mentioned. If it’s possible to include search results for the most likely page the user would have been looking for, that would add even more value.

    Copy & paste the code below to embed this comment.
  14. while I understand the comments above, I think this technique has great value to clients – especially those who never go ‘under the hood’ of their CMS. I can see myself coding this option in as a standard feature.

    I also appreciate the ever-present consideration of a well maintained site. (there are a lot out there that still are not.)

    Good show.

    Copy & paste the code below to embed this comment.
  15. one thing that wasn’t mentioned is the slew of personal firewalls/security programs that strip http-referer headers from all requests for privacy reasons. i would make note of that in the 404’s content for the first case, but still wouldn’t send an email.

    Copy & paste the code below to embed this comment.
  16. Certain software, such as AVG virus checker:
    http://www.avg.com/special-toolbar-404-dns-error-tlbrc.tpl-mcr1
    hijacks 404 pages in the browser in order to display their own ad page.
    I find it impossible to use the 404 status header because of this

    Copy & paste the code below to embed this comment.
  17. Olaf: You’re certainly correct in that the developer can make some effort to help resolve bad links from search engines and my statement that “there really isn’t anything I can do about it”? is technically not accurate.  I have made attempts to remove URLs from search engines, but in my opinion, working through the search engine’s web site, looking for a link to add/remove URLs, then actually going through the process takes too much time and effort.  Not to mention that the list of search engines I check against already numbers 150. 


    Stefan:  Personally, I would never automatically re-direct a user from a 404 page back to the home page (I’m assuming that you were referring to doing this automatically). If the user doesn’t realize that they’ve been redirected, they’ll think the index page has the information that he’s looking for, and this probably won’t be the case.  However, the message provided to the user could easily contain a link to the site’s home page in addition to the search engine and site map that I showed in my article.


    Peter: I can appreciate your concerns.  I have had discussions with other developers who suggest that this will generate an inordinate about of e-mail, however, from my experience on a relatively large site, this has not been the case.  If a spider follows a link to a page that doesn’t exist, then the e-mail message from that will allow me to correct the link and the 404 goes away.  If the error is the result of a missing file that should be available for download, either because it’s moved, or was never uploaded, this becomes very helpful in identifying and resolving those issues.  However, if someone, or something, is just hitting the site looking for pages that don’t exist, no e-mail is even generated. 


    Clerkendweller: Excellent comments.  The original intent was to provide immediate feedback to the user, then I thought “well, why not inform the developer of the problem”? and that led to what I have.  The idea of logging these errors has been discussed for exactly the reasons you mention.  I’m leaving it for phase II.


    Kevin:  Thanks for your input.  I realize that server logs contain this same information, however, I don’t want to look through any log report, and more importantly, I want to know right now that I have a problem on my site.  A simple e-mail does the trick. 


    Daniel, Chris, George:  Thanks for your kind words.  Yes, a large part of this effort was driven by trying to provide the user with accurate and specific information with links that will help them find what they’re looking for.

    Copy & paste the code below to embed this comment.
  18. I’ve seen quite a few sites (mostly content focused) that take some of the query string parameters and use them as a search to produce a list of pages that the user might have been trying to reach. Not exactly guaranteed to pull the right page, but might save the user some time, and keep them from leaving the site altogether.

    Copy & paste the code below to embed this comment.
  19. First, as other people have said, you ought to give the user what they want: a way to find that content.

    Whether it’s “a simple search box”:http://www.mediauk.com/asjuhsdhiufds or a “simple site map”:http://www.absoluteradio.co.uk/askjhdskjhfds or the “Google 404 script”:http://james.cridland.net/blog-and-a-404 you’re missing a trick if you simply create a “nice looking 404 page”.

    Second, you forgot to stress that it should still return a 404 HTTP header (if not, you’re causing a LOAD of issues with Google). And you might also want to use custom Google Analytics code on the page too, to enable logging of everything in a viewable way.

    All this is for nothing if your custom 404 page is 512 bytes or less: because, if you’re running Google Toolbar, if it’s 512 bytes or less, Google serves you a rather better 404 error anyway. Overwriting yours.

    And I strongly recommend against firing off automated emails: if your site is even mildly attacked by bots trying to find a way into your SQL/membership data/credit card data, then you’ve made your problem a whole lot worse.

    Copy & paste the code below to embed this comment.
  20. 404 errors that aren’t custom create confusion for both the user and yourself. Awesome writeup on how to actually implement it.

    Thanks!

    Copy & paste the code below to embed this comment.
  21. Fine, I’m not a professional; I’m not even old enough to be classed as one. But personally, and from my point of view, can I congratulate the author on another great ALA article. It appeals to my more practical mind, and even made me update my 404 page.
    In response to everyone else’s comments, I would like to add my own thoughts: firstly, that I rewrote this in PHP, hence reducing the security concerns I think. (If someone more experienced would like to comment on this, please do, I love being proved wrong.). Secondly I think that automated email do have their advantages – having received 4 about one link motivated me to do something about it; logs are very good for statistics but don’t give me the imperative to do something (stress on the “me” there).
    And if James is following these comments, it would be interesting to know what harm emails cause.

    Copy & paste the code below to embed this comment.
  22. If a spider follows a link to a page that doesn’t exist, then the e-mail message from that will allow me to correct the link and the 404 goes away.

    Except that Slurp goes around deliberately making up URLs that it expects not to exist, so that it can check the site is correctly sending a 404 for non-existent pages (so it knows it can assume that a 200 page really is A-OK). You don’t want to be notified of every instance of this. I’m sure there is something in the user-agent string that you look for and use some sort of trickery to filter those out.

    Copy & paste the code below to embed this comment.
  23. A couple of reader’s have written and are concerned the spiders and bots could result in a large number of e-mails being sent.  But spiders and bots that are guessing at URLs will not generate e-mails because they are not following bad links and therefore will (probably) not have an HTTP_REFERER.  But as I mentioned, HTTP_REFERER can be faked, so I’m not going to say for certain that this is always the case with all spiders or bots.  However, I have been using the ideas presented here for the past few years and have yet to experience any problems with spiders or bots accessing the site.

    Copy & paste the code below to embed this comment.
  24. I have taken this stuff into consideration when doing redesigns for a number of sites. Instead of sending emails, I created a database to capture that data and then allowed the client to provide a correct URL for common issue pages and turn it into a redirect page. It also allows the user to see a list grouped by common pages and know how often it comes up.

    Much more useful than an email every time something comes up. Also more scalable than the method in the article.

    Copy & paste the code below to embed this comment.
  25. Just one thing.

    If a user is reading certain page from your site, and manually types a new URL but gets a 404 error, wouldn’t that count as “a bad link on your site”?

    For example, someone that’s on www.domain.com and types in the address bar www.domain.com/contact.

    Wouldn’t that send a HTTP_REFERER with your domain? Then you would get an email saying there’s a bad link on the index page when there really isn’t.

    Maybe I’m getting confused here, but I wanted to ask to make sure.

    Copy & paste the code below to embed this comment.
  26. i was looking some place where I could get complete info about 404. Thanks to blog owner for writing such a good post

    Copy & paste the code below to embed this comment.
  27. Apache is more than happy to use a CGI as your ErrorDocument:

    ErrorDocument 404 /cgi-bin/404.pl

    If you don’t want that cgi-bin in the URL, just go for

    Alias /404-not-found /cgi-bin/404.pl
    ErrorDocument 404 /404-not-found

    Copy & paste the code below to embed this comment.
  28. Kevin Selles:  It’s a good question, so thanks for asking it.  The referer header is only sent by the browser when a link is clicked so, no, manually entering an incorrect URL will not generate an e-mail, regardless of the page you’re currently viewing.

    Copy & paste the code below to embed this comment.
  29. Dick Davies: You are correct in that Apache could be configured to call the Perl script directly. But when doing this the Perl script would be responsible for building the complete 404 page with all of the elements and styles necessary to have the look and feel of the website.  And it will be more difficult to access the styles and shared elements which would be located somewhere under document root.

    By executing the Perl script from within the .shtml page, the design of the 404 page (i.e. headers, footers, navigation, etc.) is easy. If you have a template for your site, the 404 page is simply a template file with the line,
    <!—#include virtual=”/cgi-bin/404.pl”—>
    inserted at the point where the content needs to appear.

    Copy & paste the code below to embed this comment.
  30. Dean Frickey: Many thanks for the explanation.

    Copy & paste the code below to embed this comment.
  31. Thanks for giving me change to join in this discuss website. i hope i can get lots of info here..

    Copy & paste the code below to embed this comment.
  32. I thought mine was useful, but this goes to another level. Nice.

    Copy & paste the code below to embed this comment.
  33. I’ve created a WordPress Plugin that mimicks the behavior described in this article, with configurable options in the admin area. Enjoy!

    Download: “Useful 404s for WordPress”:http://skullbit.com/wordpress-plugin/useful-404s/

    Copy & paste the code below to embed this comment.
  34. Hi, I tried this with some of my sites and this get better serp’s in only five days, I don’t know if this is the result of make my 404 page more friendly, only know that I only made this change in my site. Also tried plugin for wp from Marcus and work fine. Thanks.

    Copy & paste the code below to embed this comment.
  35. Great article! I really love the image also!

    Copy & paste the code below to embed this comment.
  36. Very nice, Marcus.

    Copy & paste the code below to embed this comment.
  37. Several readers have commented that I need to be sure to send the correct HTTP header in the response generated by my script.  I’ll admit I hadn’t given this any thought so I started looking into it.  I used the Live HTTP Headers FireFox extension to watch the HTTP traffic.  When I select a link or type in a URL to a page that doesn’t exist, I receive “HTTP/1.x 404 Not Found.” So it seems that Apache is sending the proper heading.  If anyone has more to add to this please speak up.

    Copy & paste the code below to embed this comment.
  38. You say you have a text file of search engine referers – could you make this available? It would save a bit of time compiling one…

    Copy & paste the code below to embed this comment.
  39. Thanks Dean for your article! It already helped me providing better user experience on my websites.

    In the meanwhile I’ve created a Ruby script which mimics your Perl script.
    You can downlaod from the “‘More useful 404 error page’”:http://github.com/perfectionlabs/more-useful-404-error-page/ project on GitHub.

    Copy & paste the code below to embed this comment.
  40. If you are getting a 404 from search engine referrals, that’s because you’ve forgot to setup a 301 redirect to the new URL. Otherwise you should have a custom 410 page saying that the resource was removed for good. See “RFC 2616”:www.w3.org/Protocols/rfc2616/rfc2616.html for HTTP 1.1 specification.

    Copy & paste the code below to embed this comment.
  41. If you are getting a 404 from search engine referrals, that’s because you’ve forgot to setup a 301 redirect to the new URL. Otherwise you should have a custom 410 page saying that the resource was removed for good. See “RFC 2616”:www.w3.org/Protocols/rfc2616/rfc2616.html for HTTP 1.1 specification.

    Copy & paste the code below to embed this comment.
  42. Jeremy Flint,

    We use google 404 with our University website in combination with analytics so we know when pages are broken.  I would like to combine our current setup with some of the things that were discussed in this article.  Google 404 works really well for us…here is an example: http://www.uwgb.edu/asdf

    Copy & paste the code below to embed this comment.
  43. I’ve been looking for some good code for this for a while, and I’m especially impressed how this takes all of the error scenarios into account in order to address the issue.  What I didn’t see was an easy download link for the PERL script, am I just supposed to copy and paste all the code snippets on the page together?

    Also, I’ve noticed when you get a 404 from sites like Google.com, it doesn’t display the actual 404 page url, but displays the 404 page ON the URL you typed.

    Example:

    http://www.google.com/oops404

    Even though there is no oops404 page, it looks like there is.  If someone wanted to integrate that functionality into this code, is that doable, and how would you go about it?

    Copy & paste the code below to embed this comment.
  44. Hi this is a great article, most of the stuff I have converted to PHP.

    However I can’t use $_SERVER[‘SERVER_NAME’] . $_SERVER[‘REQUEST_URI’] to determine the page the user was trying to get. I just get the .php error page that the user was forwarded onto.

    Anyone have the same problem? how did you solve it?

    Thanks
    Steven

     

    Copy & paste the code below to embed this comment.
  45. What is actually quite fun to play with is using the php similar_text() function (or anything similar) to match the desired url against a list of valid urls for a site and redirect to the closest match.  Using certain limits as to how close the match must be I use it to fix problems when a users miss types one letter in a longish url, or do a bad copy and pate job where they add or drop a letter.

    Copy & paste the code below to embed this comment.
  46. How could there be a more useful 404 when you already published… “The Perfect 404”:
    http://www.alistapart.com/articles/perfect404
    by Ian Lloyd January 16, 2004
    Just kidding, good stuff.

    Copy & paste the code below to embed this comment.
  47. I didn’t take the time to read through all of the comments so I don’t know if this was already suggested. For my 404 pages I utilize my sites search functionality. IF a URL is mistyped I have a little message that says “We couldn’t find the page you requested. Did you mean one of these:” Then I display the search results form my site search.

    Copy & paste the code below to embed this comment.
  48. In reply to the Google 404 widget post, another similar idea – the “Linkgraph”:http://linkgraph.net/ widget, was released in December 2008. It’s a tool like the “Google 404 widget”:http://googlewebmastercentral.blogspot.com/2008/08/make-your-404-pages-more-useful.html only the Linkgraph widget uses a database of all previous URLs of a site’s pages to get the right URL when you click a broken link. Provided you got to the page through a broken link of course.

    Copy & paste the code below to embed this comment.
  49. Hallo, I’ve tried it, and I have to say it works great. Thanks.

    Copy & paste the code below to embed this comment.
  50. I feel that 404 pages are useless, but mainly for SEO. What I would do and what we suggest doing is putting a php 301 redirect above the header of your 404 page.  Where this does not give a great user experience (unless you rework the page its directing to, to signal that the page was not found) it does 301 any 404 page that comes up before the search engines can see that its a 404, thus preserving a % of the link juice and pushing it on to the page it redirects to.

    Copy & paste the code below to embed this comment.
  51. I agree with bill that 404s are bad for SEO – however 301-ing all 404s to a single other pages isn’t quite optimal either. The optimal (if sometimes unattainable) experience would be for any old page/URL to get 301 redirected to the new/working page that is most applicable to the old page.  This is really common when our clients move to a new CMS for example.  The old pages (the content of them) are still on the new site, but all at new URLs.  We’ve actually done this so many times we built a tool called www.errorlytics.com that does exactly this quickly and easily for sites running PHP, JS and Rails…and has Wordpress and Drupal plugins/modules.  The idea of the tool is to 1) make you aware of the often ignored 404 and 2) make it so you can get rid of them via 301 redirects thus preserving SEO and the end user experience.

    The key edge that Errorlytics offers over tools like, for example, linkgraph is that it allows the webmaster not only to ultimately get the user who has requested a bad/dead URL to get to a page that has some content on it – but Errorlytics does this via an SEO friendly 301 so even the spiders are happy.  Using a frameset and a meta refresh is far from optimal on the SEO side.

    Copy & paste the code below to embed this comment.
  52. Steven, I just tried to mimic .htaccess and php, and for me $_SERVER[“SCRIPT_URI”] returned URL of the page user asked for. You can temporary send your 404 to a .php page with

    <?php phpinfo (); ?>

    then go to any not-existing webpage in your website, 404 page will show PHP info page, just examine the variables it shows and I’m sure you will find webpage you entered, not only name of 404 script. Just do not forget to restore 404 to your original script after fixing the problem.

    Copy & paste the code below to embed this comment.
  53. I have now implemented a similar system to this in PHP. Thanks for the interesting article gave me some great ideas.

    Copy & paste the code below to embed this comment.
  54. So many requirements for error page 404. Interesting visits statistics of page 404 in percents. I’d like to see examples of 404 pages. For self I choose this functionality of 404 page http://www.kontain.com/bickov/entries/36756/page-404/

    Copy & paste the code below to embed this comment.
  55. I have been developing web sites for a number of years now and I am always surprised at the lack of attention paid to 404 pages by both developers and designers. I have always harvested 404 data and was very pleased to find this article and see that there are web developers and designer out there that understand how the “404” should be used to teach us about issues our end users are encountering.  This technique has saved my shame and insult because I can proactively find issues.

    Copy & paste the code below to embed this comment.
  56. How can you translate the plugin to another language?

    Thank you.

    Copy & paste the code below to embed this comment.