Graceful E-Mail Obscuration
Issue № 248

Graceful E-Mail Obfuscation

In “Win the SPAM Arms Race” (A List Apart, May 2002), Dan Benjamin talked about the importance of hiding e-mail addresses on our websites from vicious, e-mail address harvesting bots—or spam bots, as they are more often called. Dan pioneered a JavaScript-based solution for bypassing the indexing mechanisms that spam bots use. Here’s a quote from the article:

Article Continues Below

Posting a naked e-mail link anywhere on the web (or in a newsgroup, in a chatroom, on a weblog comments page…) is generally the kiss of death for your once-healthy address.

It’s hard to believe, but it’s been more than five years since Dan wrote these words. So, did we win the SPAM Arms Race? As you may have noticed by looking at your own inbox recently, not exactly. The Messaging Anti-Abuse Working Group (MAAWG) estimates that 90 billion spam messages are sent every day, and 80–85% of all incoming mail is abusive.

A shared responsibility#section2

Many web users don’t understand the inevitable consequences of exposing their e-mail address on the web. Experienced web developers and website owners, however, do. Thousands of spam bots tirelessly crawl the web to collect e-mail addresses exposed on websites, in blog comments, and elsewhere. These addresses end up in databases sold to unsavory marketers, who bombard the owners’s inboxes with unsolicited mail.

Of course, spam is an increasingly complicated problem that can never be solved by the efforts of web developers alone. But don’t underestimate your own powers.

An unpleasant surprise#section3

I work for a large non-profit organization that provides social services for the blind and visually impaired. After Wim, our system administrator, complained about the massive amounts of spam our mail server had to process, we started a small investigation. It turned out that 90% of all spam was sent to a mere 5% of the e-mail addresses we own, and guess what? They were exactly the addresses that had been published on our website.

Although most of the damage had been done by then (remember Dan’s quote), I promised Wim I would come up with an effective way to protect the addresses on our upcoming portal, on which we intend to publish even more addresses.

My solution would need to defeat spam and be accessible. We work intensely with and for people who have (mostly visual) disabilities. Accessibility is not an optional add-on.

A few months ago, Wim very unexpectedly passed away (we miss you, Wim!). Since then, I have spent a lot of time thinking about a way to fight spam bots. In this article, I’ll share my ideas on the subject and leave you with a working script to build on or to use in your own projects right away.

The problem with current techniques#section4

Wikipedia has an excellent overview of anti-spam techniques. Their article also includes interesting links to articles about e-mail obfuscation. (Google the subject for more). Over the years, I’ve tried more than a dozen of these techniques. Although most seem effective, I can’t use them in my projects, as every one fails to meet one or more essential requirements. My requirements are:

1. No hassle, please#section5

You’ve certainly seen e-mail links that look like “mailto:contact_removethis@company.com” or “mailto:contact(at)company(dot)com”. If you’re like me, you probably don’t like to correct a deliberately misspelled e-mail address after you click on it. Moreover, users who don’t notice what’s wrong with the address will end up frustrated, because their message cannot be sent or delivered. Similar techniques require users to re-type a (correctly spelled) address that’s rendered as an image—which isn’t any better, of course.

Although they don’t require JavaScript, these methods of e-mail obfuscation add an unpleasant barrier to a task as trivial as sending an e-mail. Clearly, this is not the right way to treat visitors or (potential) customers. I want real, clickable e-mail links that work just as expected, but—at the same time—are immune to spam bots.

2. Graceful degradation#section6

JavaScript-based techniques—like Dan’s—offer the seamless user experience I’m looking for. They’re all based on the simple fact that spam harvesters are incapable of parsing JavaScript or understanding DOM changes initiated by JavaScript events. Instead, spam harvesters try to extract e-mail addresses from raw HTML by using brute force algorithms—even Googlebot chokes on most of the JavaScript it comes upon. Only real browsers know how to handle JavaScript and can undo the obfuscation—either by stitching together document.writes or by using a more advanced, unobtrusive, event-based approach.

An important downside is that such solutions are not bulletproof. Visitors who surf the web without JavaScript support—whether by choice or not—are out of luck, because they’re treated as spam bots. These visitors include people using text browsers, old or incapable screenreaders, or mobile devices with limited capabilities. Other users have JavaScript turned off for security reasons or because of company policies. W3Schools estimates that 6% of internet users have no access to JavaScript as of January 2007. As a comparison, if you believe that’s not enough to really care about, then maybe it’s time to reconsider why you strive to make your markup and CSS accommodate the 1.5% of IE 5.x users or the 1.3% of Safari users (again, W3Schools).

3. Install and forget#section7

Most e-mail obfuscation techniques I’ve tried tend to be bothersome and time-consuming to implement because they have to be applied to each and every e-mail address that you want to protect. Most require you to use lengthy inline script elements and inline event handlers. They may also invalidate your markup.

I wanted a transparent and fully automated solution that I can set up once and never worry about again. That’s the only way I can guarantee that all addresses that appear on our website are safe—even the ones that show up in blog comments.

Putting it together#section8

Enough talking. Let’s get our hands dirty.

The ingredients#section9

You’ll need Apache 2 and PHP 4 or later. On the web server, the mod_rewrite module must be enabled and you should be able to set Apache directives through the use of .htaccess files. Most web hosts have this enabled by default, so you probably don’t have to worry about it. For help on these Apache-specific features, check out the Apache documentation.

Put on your masks#section10

Setting up Graceful E-Mail Obfuscation (GEO) involves a few steps. The key is to replace all occurrences of mailto links with innocent-looking URLs. Take this e-mail link as an example:

<a href="mailto:sales@yourcompany.com">
  E-mail our sales department
</a>

After the server-side treatment (I’ll get to that in a minute), that same link will look like this (line wraps marked » —Ed.):

<a href="contact/sales+yourcompany+com" 
rel="nofollow">
  E-mail our sales department
</a>

Let’s just take this one step further and apply some basic ROT13 to it.

<a href="contact/fnyrf+lbhepbzcnal+pbz" 
rel="nofollow">
  E-mail our sales department
</a>

From the results of web exposure tests I did with freshly created addresses, the ROT13 encryption did not seem to be necessary for the technique to be effective. However, it does add an interesting level of obfuscation that certainly won’t do any harm either. If you’re not familiar with ROT13, I should note that it doesn’t add real cryptographic security. Wikipedia offers an accurate description of what ROT13 does:

Applying ROT13 to a piece of text merely requires examining its alphabetic characters and replacing each one by the letter 13 places further along in the alphabet, wrapping back to the beginning if necessary

There are a couple of other things to note here:

  • I choose “contact” as a faux folder name for this example, but you can choose anything you like. To substitute the “@” and the dot in the address, I opted for a “+”. A “+” is typically not allowed in real e-mail addresses and it doesn’t have to be URL-encoded—which will come in handy later on.
  • The rel="nofollow" part is added to instruct search engines that they don’t need to follow these links and index subsequent pages. Read more about rel=“nofollow” on Microformats.org.

Away with the mailtos! We’re left with plain old hyperlinks. Well, except that they’re broken, maybe; but we’ll fix that soon enough. As you can imagine, there’s very little chance that a spam bot will identify these links as e-mail links—because…they’re not.

The script#section11

To replace each occurrence of a mailto link in a given webpage with a regular URL, I’ll use a PHP search-and-replace regular expression. The URL notation reuses parts of the original e-mail address so that it can be reconstructed later on. For this, we’ll take the entire HTML page as the subject of a PHP preg_replace() function (line wraps marked » —Ed.):

function encrypt_mailto($buffer) {
  preg_replace("/"mailto:([A-Za-z0-9._%-]+ »
  )@([A-Za-z0-9._%-]+).([A-Za z]{2,4})"/","" »
  contact/\1+\2+\3" rel="nofollow"",$html)
}

With ROT13 enabled, the encrypt_mailto() function looks quite a bit longer, as you’ll see in the finalized PHP class that you can download at the end of the article.

Now I want the script to intercept and parse all HTML pages before they’re sent to the browser. I’ll use PHP’s output buffering mechanism for that. In its simplest form, output buffering is activated by using a callback function:

ob_start("encrypt_mailto");

Using .htaccess, plus PHP’s little-known, but powerful auto_prepend_file directive, we can now automate this process for an entire website or for specific folders only. If you add the following line to your .htaccess file, prepend.inc.php will be automatically included at the top of every PHP document that Apache serves.

php_value auto_prepend_file /yourpath/prepend.inc.php

The prepend.inc.php file in itself initiates the output buffering and runs the entire contents of the served pages through the encrypt_mailto() function.

Also note that for this prepending to work properly, you must make sure that PHP code in plain HTML documents (without the .php extension) is parsed by PHP as well. Add this line to the .htaccess file:

AddType application/x-httpd-php .php .htm .html

This might demand a bit more processing power from our web server, but it’s the easiest way to make sure that all our web pages get the server-side special treatment we need. If you’re using a CMS or some sort of application framework, you could opt to cache the server-side encryption.

Fixing the links#section12

Now that we’ve effectively disguised our mailto links, let’s see what happens when someone clicks one of these funny “contact/...” links. Well, except for the Error 404 page: not much.

In the end, visitors shouldn’t notice anything unusual about our e-mail links. A few lines of JavaScript will help us to restore these links into their original shape. But wait: what about those 6% that have no JavaScript support? When JavaScript is not available, our “contact/” URLs will not be “decrypted” on the client side, resulting in a 404 error. Apache to the rescue!

Let’s configure Apache so that its mod_rewrite module will intercept all URL requests that match the pattern we defined earlier. Apache will then derive the segments that make up the e-mail address from the URL and pass them quietly to an intervening PHP script that undoes the ROT 13 encryption and prepares the address for further processing. This is what the Apache rewrite rule looks like (line wraps marked » —Ed.):

RewriteRule ^.*contact/([A-Za-z0-9._%-]*)+ »
([A-Za-z0-9._%-]*)+([A-Za-z.]{2,4})$ »
/yourpath/mail.php?n=$1&d=$2&t=$3 [L]

Note that I had to split the regular expression to fit on this page, but you can download an example .htaccess file at the end of the article.

Providing an elegant fallback solution#section13

Here comes the fun part! Coming up with a safe, elegant and easy to use—or “graceful”—alternative for visitors to send an e-mail when JavaScript is unavailable, is where your own imagination comes into play. How you do it depends on the type of website you’re using it for, but I don’t suggest using a visual captcha for this purpose: it’s quite likely that people who get to see this non-JavaScript page cannot see the captcha image either (either because they’re using a screen reader to compensate for a visual impairment, or because they’re using a text browser).

One solution would be to offer users a simple contact form that allows them to send a message without giving away the actual address. And if your website already uses a contact form, you could choose to redirect “unencoded” mailto links to that page.

In most cases, however, people do want the actual address. So, for this example, I decided to prompt the user with a question that’s hard to answer by a spam bot, but easily enough for humans. If the right answer is given, the script can safely assume that it’s not dealing with a spam bot and reveal the actual e-mail address.

To see how this works, take a look at the demo page I put together. Be sure to turn off JavaScript to see the degradation in action. If you’re using the Web Developer Toolbar for Firefox, choose Disable > JavaScript > All JavaScript.

JavaScript for the rest of us#section14

Now that we’ve implemented a non-JavaScript fallback, let’s make sure that the other 94% of users won’t notice anything “funny” about our carefully masked e-mail addresses. So, let’s revert the page’s DOM to what it looked like before the page’s source code was modified by the PHP script.

First, we need a JavaScript search and replace regex that does exactly the opposite of what our PHP regex did. I wrote a function around it that looks like this (line wraps marked » —Ed.):

function geo_decode(anchor) {
  var href = anchor.getAttribute(’href’);
  var address = href.replace(/.*contact/ »
  ([a-z0-9._%-]+)+([a-z0-9._%-]+)+([a-z.]+)/i, »
  ’$1’ + ’@’ + ’$2’ + ’.’ + ’$3’);
  if (href != address) {
    anchor.setAttribute(’href’,’mailto:’ + address);
}

Next, we must loop through all anchors on the page and tie the geo_decode() function to the onclick handler:

var links = document.getElementsByTagNameName(’a’);
for (var l = 0 ; l < links.length ; l++) {   links[l]. {
  geo_decode(this);
}

And finally, let’s attach the geo_decode() function to the window.onload object:

window.onload = function () {
  geo_decode();
}

To make things run smoothly, a little more code is involved. Take a look at geo.js.php to see how I implemented the ROT13 “decryption.” If you read through geo.phpclass.php, you’ll see that the link to geo.js.php (the file that restores your mailto links) is auto-inserted right before closing the head tag with the help of PHP’s output buffering. This means that you don’t have to add a single line of code to your existing documents to make the script work.

Try it yourself#section15

I’ve set up a demo page for you to experiment with, and you can also play around with the source files:

  • .htaccess contains the Apache directives to prepend geo_prepend.php and to redirect page requests using mod_rewrite.
  • geo.prepend.php instantiates the PHP class and sets some custom properties.
  • geo.phpclass.php contains the PHP class that does the “encoding” and inserts a script tag before the closing head element that loads geo.js.
  • geo.js.php contains the JavaScript that’s responsible for the “decoding.”
  • mail.php contains an example of a usable fallback script for when JavaScript is unavailable.

…or download the ZIP archive (8 kB).

The script works in all major browsers, including Internet Explorer 5.01.

A solution. For now.#section16

Alas, no e-mail address that appears online is entirely safe. Until all spam is banned from this world, we have to try our best not to make it too easy for spam harvesters to steal our addresses (and make money out of them). Now you can protect your addresses in a fully automated way while at the same time being gracious to all users, so you can focus on what’s really important: getting your content out.

This is only an interim solution. We should all be planning for the day when spam bots get smarter, and outwit them when they do. We should not pretend that legislation alone will be the silver bullet to address the world’s spam problem, so web developers will have to continue to come up with creative solutions to fight the problem—and masking your addresses is one of them. I look forward to reading your comments and suggestions.

About the Author

Roel Van Gils

Roel Van Gils is an internet consultant who specializes in website usability and accessibility. He carries out expert reviews and user tests with people who have disabilities, and helps advocate accessibility in his country. He also co-authored the AnySurfer Guidelines for Accessible Websites that are endorsed by the Belgian government. Roel lives and works in Ghent, a wonderful medieval city in Belgium.

101 Reader Comments

  1. Thanks.
    Would a possible alternative be to read the visitor’s browser type and only write real email addresses for human-operated browsers? It might not be perfect as you’d have to be quite studious about including all possible browser alternatives, but I think it would work quite nicely and not need a lot of code.
    (However, it wouldn’t surprise me if some spambots disguise themselves as browsers)

  2. Does this not rely slightly on security by obscurity? Now you’ve published the code, a determined spam harvester could update their bots to check for that regexp, and if it matches, decode the true address?

  3. I came to this article expecting some kind of ineffective hex-encoded email address obfuscation, but instead I found a great idea which could make a big difference to email harvesters (however big a part harvesters are to the overall spam problem).

    Of course the best way to make this work is for everyone who uses this technique to implement is slightly differently – be that using a different URL base than “contact/”, or using different encoding for the address or something else. As soon as a bot writer can’t make assumptions about the information being hunted then it gets a lot harder to write an effective bot.

    It’s definitely something I’ll be looking to implement soon.

  4. What about addresses that have more than one . after the @?

    The example code assumes that no one would have a TLD like .co.uk or .com.au etc.. In reality it is not possible to encode both . and @ with the same value.

    Bart.

  5. I find it easier tojust sacrifice an email address and use Gmail – much nicer, much easier for everyone concerned, and uses the best Spam filter in the world.

    I’d rather place the workload of trying to get in tuch at my end rather than the clients, and so I should be the one working to sift through the emails.

  6. This is one of my pet peeves: the plus sign IS a valid email address character. Postfix uses it by default to separate the “real user” from a meaningless suffix. So, for instance, mail to foo+bar@baz.com would actually be delivered to foo@baz.com. This would be a great way to track who sold the email address (e.g. by giving out foo+amazon@baz.com), if it weren’t for all the moronic sites who think the plus sign makes it invalid.

    (In fact, a surprising amount of stuff is allowed to the left of the at sign, because only the user’s email server should be interpreting it. Not you!)

  7. I have to point out that plus (+) is *certainly* allowed in the local part of an email address ( “RFC 2822”:http://tools.ietf.org/html/rfc2822 )

    It always annoys me when I try and sign up to a site with an email address with a plus in it, and can’t because there is some needlessly overly-restrictive regular expression sitting in the background.

  8. @Gareth Adams:

    You’re right. I’m aware of the fact that, according to the RFC, a plus sign is allowed for the local part of an e-mail address. In reality, however, e-mail service providers typically don’t allow user to create addresses that contain a plus sign. I did point this out in the article.

    Though, you can easily adapt the regex (both the JavaScript and the PHP one) so that the @ is replaced with something else (instead of ‘+’).

  9. why are we discussing this? Just use email with a good filter like SpamAssasian, or route all your mail through Gmail and/or Google Apps for your Domain.
    I have my email address plastered all over the internet, and i see maybe 1 SPAM per week in my inbox, while my gmail spam folder fills up with 100+ per day. The false-negative rate is zero, as far as I can tell for the past several months. So instead of trying ultimately futile methods of security through obscurity, just let Google or someone else do the hard work for you.

  10. @joe:

    So the idea of thousands of spam messages being sent to your mail server doesn’t bother you as long as you don’t see them in your in-box? Does it concern you that bandwidth is being wasted on these messages?

    To me it’s like heating your house in the summer and then running air conditioning to lessen the heat. Sure, if your air conditioner is powerful enough, you can lower the temperature to a comfortable level — but look at all the power you’ll waste in the process.

    I agree that no solution is pefect (so does the author, and says so) and that some spam will inevitably find its way to your server. But it still better to try to prevent addresses from being harvested. Belt and suspenders. Better levees and better evacuation procedures.

  11. bq. I wonder why it is not mentioned to “˜build’ the email address with a mouse-over and an innerhtml javascript swap.

    Possibly because this is then unusable to anyone who can’t use a mouse?

    For me, the bigger problem is that it requires a specific server type, or use of a specific language. I want something that I can use on different platforms, with different server languages that meets the original requirements *plus* platform independence.

    But I accept I’m probably in cloud-cuckoo land for the time being: so I’ll just stick to my contact forms and/or published email addresses with spam filters…

  12. I think you’re missing the point when you’re saying that “e-mail service providers typically don’t allow user to create addresses that contain a +” The point of plus addressing is to *add on* to your email address. It’s a great “native” feature to fight spam by allowing you to track who sold your email or block specific incoming emails, all from a single email account… (It works great with gmail btw).

    I think your solution is interesting (although it would seem that it might strain the server too much for what it does, but that’s just a guess at this point) but since this article is on ALA, it would be good if it was updated to replace “+” in your regex. I think most people will just use your script without modifications and thus ALA will effectively participate in indirectly furthering the obsolescence of plus addressing which is itself a great anti-spam feature… Sort of ironic since the point of your article is to fight spam 😉 No?

  13. “A “+”? is typically not allowed in real e-mail addresses and it doesn’t have to be URL-encoded”

    That’s wrong on both accounts, actually. As Yann B already pointed out, “+” is perfectly valid in an email address. As for URL encoding, if you don’t encode the “+”, it gets turned into a space when the query string is decoded. So, unless that’s the desired effect, you *DO* have to encode “+” in a URL.

  14. On sites I build, I don’t have email links anymore. Everything goes to a form, which in turn has a logic question (not a captcha image) to defeat bots. Yes, that presents a certain barrier to users, but if the forms are accessbile and the user even slightly motivated, it works. The email servers are no longer swamped and the request forms don’t become porn magnets.

  15. @joe:

    Agreed. The bandwidth is already being used, and it’s Google’s, thusly, no spam will ever end up on any server belonging to me. Not mention, I won’t need to have this script run every time someone visits my page, so extra savings there as well.

    If someone really wants to take the hard way to stopping spam, maybe they should write a letter to their congressman. Not only is it less effective, but takes many times longer to see results!

  16. I run my own mail server for me and my family, and I encourage them to use a + in their email addresses for different sites, so they know which sites are sharing their e-mail addresses.

    It annoys me to no end when I try and sign up to a site and it says my e-mail address isn’t valid!

    I would aggree with Vann B, that a lot of people are going to use the ALA scripts unmodified, and in that respect, would encourage fixing the code to allow the + to be used.

  17. I (also) wrote a tool for obfuscation that works better than most of the other online tools I’ve seen. It was written mostly for myself, but it does work, so maybe someone else can benefit from it. Details about why I think it’s awesome are on the FAQ page if anyone cares.

    The techniques it uses all suffer from the negative points mentioned in this article, however.

    http://www.obfuscatr.com/

  18. bq. In reality, however, e-mail service providers typically don’t allow user to create addresses that contain a +. I did point this out in the article.

    Errm. You mean like Gmail. I’m always giving out e-mail addresses with a + in them. It allows for easy filtering.

  19. JZ notes: “So the idea of thousands of spam messages being sent to your mail server doesn’t bother you as long as you don’t see them in your in-box? Does it concern you that bandwidth is being wasted on these messages?”

    Of course bandwidth is a concern, but the solution isn’t an arms race, particularly an arms race where the website owner is degrading the usability or accessibility experience of the site visitor. By treating all visitors as bad until they confirm themselves as humans (or humans who correctly answer a generated question) is treating visitors as culprits.

    If you want to avoid email spam, don’t use email. If you insist on using email as a means of communication then it is your responsibility to deal with the implications of using email, and you should not belabour the visitor and treat them like a guilty party until they prove their innocence.

    The techniques in this article are both simple and easily reversible. Even the “human proof” question is easily scraped/parsed and answered. Its a generally trivial regex to reduce the question into a form that can quickly be calculated.

    I’m watching black-hat SEOers and comment spammers automate the signup process on Blogger and Yahoo, their code works fairly stably, and both systems use Captcha and other anti-scripting techniques. (I did a presentation on this in the last Barcamp London)

    The problem here is that spammers have a monetary incentive to break through these flimsy defences. So any solution is merely temporary until the spammer is incentivised enough to spend an hour coding their way through these obstacles.

    There is no long term benefit with this solution, but there is a long term usability cost. How do visitors typically react when they are presumed guilty the first time they visit a website? Is that the experience we really want to be recommending?

  20. A very well written article as well as a brilliant idea. Thank you for sharing. I work for a public university so accessibility and security are also major concerns for us. We have had a lot of success using SpamSpan (http://www.spamspan.com/), which relies entirely on JavaScript and doesn’t embed quite as much data into a page as some other solutions. It is also based on the DOM and is easy for Contribute users to remove (which by default protects embedded forms and JavaScript).

    I do have a question about the solution you suggest to handle if a user has JavaScript disabled. The form presents a question to the user, which when answered correctly, proves they are not a machine. As a person working in an accessibility-minded social-services agency, do you think that this form poses a problem for persons with cognitive disabilities?

  21. While I certainly agree that a + sign is allowed in email, and many people do in fact use it as an anti-spam measure (though, to be accurate, it’s more about tracking spam than preventing it), I honestly can’t expect it to serve this purpose for very long.

    If it’s standard behavior for email servers to simply discard the + and anything after it, what’s to stop people from simply doing the same thing? If I run a site which collects email addresses, and I intend to sell them, I’ll most certainly just run a quick regex and strip the + and everything after it _before_ selling the address. Harvesters will likely do the same soon, if they don’t already.

    I’m not saying you shouldn’t use a + in email addresses you give out, just don’t pretend it’s a cure-all, when what limited usefulness it currently serves is bound to be lost in the not-so-distant future.

  22. Given the example in the article uses preg_replace() in the callback for the output buffer, a large page on a high traffic site can introduce some performance problems. The degraded version, however, has proven time and time again to help thwart spam. If you fill out a contact form (something like /contact/sales or /contact?who=sales), you still achieve the removal of the email address from the site, your users can still contact you, and you drop the expensive preg, the JS reversal of obfuscated addresses, and the Apache rewrite rules. While this probably goes without saying, be sure to benchmark before deployment.

    (Yes, it’s possible to automate the posting of data to the form in order to achieve the same goal, but this requires a custom tooling of a harvester which continues to be more effort than it is worth when there are thousands of other emails floating around on the Internet.)

  23. First off. Before I rant. Excellent article and ideas. I will be using the ideas and techniques in the future.

    And now, rant one:
    Congratulations on using some of the most unrepresentative of “average” stats available—W3Schools, a site visited mainly by Web designers and developers, not the general public. Let’s stop using stats as an argument whether or not to adopt something as best practice. There’s no such thing as a universally representative set of stats. Stats are only useful from your own site for analyzing your own users—and even then, it’s only useful for analyzing the users that you *currently* support (not the potential customers that are getting a sub-par experience through bugs, bad code or obtrusive practices).

    Rant two:
    I was going to mention the + as being valid. Others did. You also had a fine point: modify the RegEx to fit your own needs.

    Again, thanks for some excellent ideas. Please be responsible with those quotes of statistics. 😉

  24. I’d be very interested in seeing stats on the cost of processing these scripts on the server. Have you run a comparison study?

  25. I prefer using contact forms, especially on my work web site where there are many departments. I can use a pull-down to route the email to the appropriate department based on the inquiry, and the email addresses are maintained in a database so that should a contact person change I can change it in one place instead of searching the whole site to make changes.

    Digital Web recently had a helpful article on building “bulletproof contact forms”:http://www.digital-web.com/articles/bulletproof_contact_form_with_php/ .

  26. An interesting read, but I was disappointed to see that it was very platform specific. ASP developers are out of luck.

  27. Great techniques posted here – but sadly there is one hugely gaping hole that is if you use a real email address when registering a site – I made the mistake of registering with my real (as opposed to either of my yahoo & msn spamcatcher) addresses and my spam went from 1 every few weeks to 300 a day in the space of 2 weeks !

  28. I found the article very interesting from a technical, problem-solving perspective, but as others have pointed out it’s not really necessary to go to this much trouble. My email address is all over the Internet and there’s nothing I can do about it, so I pipe all mail through Gmail and then access it with POP3 or IMAP. My own Web sites use mail forms instead of mailto links. I see perhaps one false-negative (spam not identified) per week and have never seen a false-positive.

  29. E-mail obfuscation is a solution, but to an invalid issue.

    You are missing the big picture. The ultimate objective is not to find a clever way to publish e-mails addresses, readable by users, hidden to automated harvesters. Or to filter spams downstream, as some have suggested here. It’s to offer your readers/clients/users an effective way to reach you, fast and easily. This is what really counts. And, that is why you published an e-mail in first place. Since spam renders the e-mail approach ineffective, you need an alternative to client-side mailing.

    You need an a server-side communication system. The solution is obvious : a mailing form, handled server-side, to initiate the contact. That way, there is no e-mails to be harvested and users can still reach you. It’s that simple.

  30. Microformats is one of the biggest thing on the horizon. The idea here is to provide visitors of your website with useful information including your email address.

    So one of the questions for me is how do we manage to provide this information without opening the flood gates to spamming. Any thoughts on that?

  31. Rudi Gens asks: “So one of the questions for me is how do we manage to provide this information without opening the flood gates to spamming. Any thoughts on that?”

    Tackle the problem at the source. Link email harvesters to spammers, and then prosecute email harvesters. See http://www.projecthoneypot.org/

    Its better than wallpapering over cracks.

  32. I was checking out your example page in FireFox and selected the email addresses, right clicked, and selected View Selection Source, which converted the email addresses for me. I’m not sure what FireFox does to the code to generate that but it converted it for me in plain text.

  33. I don’t sell my visitors email addresses, but I purposely strip out everything after the + in their email addresses (IE, bob+fakenamegenerator becomes bob).

    It wouldn’t surprise me if people who sell email addresses “clean” them first, or if spammers “cleaned” their mailing lists before sending out spam.

  34. Several people have commented that using forms eliminate the problem of spam from web sites. If only that were true. Spammers are already fully automated in posting to forms; one of my sites with just two forms produces nearly 1,000 bogus posts _every day_. Which means you’re right back to where you started… filtering for spam.

  35. Have to agree with Mike Davies comments made in his post of “The site visitor should not have to prove their innocence – they are not guilty”.

    Spam is a cost of doing business. E-mail spoofing is a more critical issue, I believe. Ensuring validity of e-Mail to clients and customers by use of a digital signature is a fundamental requirement, also.

    Attempts to defeat the keyboard banging monkey should never defeat the ease of use nor convenience for the customer to communicate.

  36. As Marty points out, the use of + to append to the local part of an email address will be ineffective as long as the suffix can easily be filtered out.

    The answer to this is to make it common practice to junk all mail which does not include a suffix that the account owner has given out and does not come from whitelisted contacts and does not include some secret.

    Usage: friends mail joe, registrations made to joe+ somesite, friends-of-friends use either eg. joe+ bill or include [a987sd] in subject, until they are whitelisted in their own right.

    Force people not to junk the + if they want to contact you. Then you deal only with spam that has been sent through friends’ addresses and have not required everyone to install PGP or something.

  37. This article just gave me a funky idea. I tesed it with some browsers I had available, and it does seem to work.

    The idea is to use some “/contact.cgi/encoded-e-mail-address” URL, and then have the contact.cgi do a 3xx redirect to the mailto: URL.

    Sure, the bots can now try and follow all your links in the hopes they will find a redirect to actual mail address there. But I think it would be prohibitely ineffective for them, and also would leave a recognizable mark in he logs.

  38. Personally, I find ‘mailto:’ links annoying, so I try to keep first-time contacts limited to a form handled by PHP. However, a mailto link could be returned by PHP just as easy.

    For example, you could have a link like this:
    email Jim Turner

    In PHP, you could keep an array which says who everyone is or connect to mySQL and get the email addresses that way.

    < ?php $who; $email; // an array with Jim Turner in slot 1 if($who = "jim_turner") header("Location:mailto:" . $email[1]); ?>

    While it might be an annoying script to write, you could code this once and forget about it as long as you can remember your co-workers’ names.

  39. _[…]one of my sites with just two forms produces nearly 1,000 bogus posts every day._

    It means only one thing : you have designed forms for robots to use, not humans. Your forms don’t require any intelligence and they don’t speak “human”. You are still thinking forms as a bunch of input fields put together, and possibly you don’t validate the supplied data properly.

    Let me guess. You have published a form with inputs labelled so robots can use them:

    Name: [___________]
    E-Mail: [___________]
    Message:[_____________(multi-line textarea)_____________]

    How about a single textarea instead?

    _If you need to contact us, please leave us message in the following box. Don’t forget to identify yourself and give us a mean to reach you (a phone number or an e-mail address will just do fine):_ [____(textarea)____]

    There are a multitude of other creative ways to do deal with spammers. Require intelligence.

    I have two questions for you. How many bogus posts do you see in this thread? How do you explain the difference between this thread and your 1000 bogus posts/day form design?

  40. bq. If you need to contact us, please leave us message in the following box. Don’t forget to identify yourself and give us a mean to reach you (a phone number or an e-mail address will just do fine): [_(textarea)_]

    A form that requires people to follow instructions? As if!

  41. “Contact the webmaster” doesn’t make it clear to the user that (s)he is about to hit a mailto link which will make their email client pop up (which is especially annoying when this users only uses webmail). “Email the webmaster” is better, but could still lead to a form.
    I think the best way is to use the actual email address as a label. But that means that the address is still in the code and can therefor be harvested. So I think I’ll stick to the name(at)domain.nl notation which will be translated into a mailto link using JavaScript.

  42. I’ve just checked two day’s worth of email and only 11% of the spam is to the address that’s included, without obfuscation, on every page of our company website. In our case, the biggest spam magnets by far are the people who read HTML emails and automatically download external images. Web bugs give the spammers an excellent record of which messages get through our filters and which don’t.

    That said, well done for a nice article with some interesting points. I’m just glad I don’t need it!

  43. I also tried the 3xx redirect to a mailto URL, and while it does indeed bring up an email client, it often leaves the browser with a blank white page. So by clicking a contact button, your site itself disappears.

    You could get around this by opening the contact link in a new window, so your site is still open, but then you end up opening two windows (browser and email), with one of them serving no purpose other than to confuse your users. Not recommended.

  44. So what about blogs on subdomains that have no access to scripts? Script-based solutions leaves out millions of people on Blogger, LiveJournal, and WordPress who get spammed just as bad as people with root access to their own domains. I personally get a Nigerian scam and a you-just-won-the-lottery scam every single day, and this has been going on for months now (although my email addresses have been public on my blog for years, it’s only lately that spammers are so aggressive). Can only the people spending money on hosting get access to the best protection, or is there a way to equalize such protection among the masses? I hate the inequality here.

  45. marah marie: do you realize how cheap it is to get a domain/host these days? lj’ers/bloggers/etc etc etc clearly aren’t hosting solutions. they’re just available as featureless freebies. if you want access to php/etc you will usually have to pay for it like everyone else. hows that for equality?

  46. @Kit Grose: If you had read the comments (specifically, comment 9) — or looked at the code — you’d have discovered that this code is smart enough to take multiple periods into account.

  47. @Walter Wlodarski:

    bq. How about a single textarea instead?

    Which will be promptly filled with spam and submitted anyway. Lack of an “email” field will hardly be a deterrent.

    bq. How many bogus posts do you see in this thread? How do you explain the difference between this thread and your 1000 bogus posts/day form design?

    Most likely due to ALA requiring an actual login, and to being policed/filtered via script, not because of the particular form fields present. What I get on my feedback form is absolutely no different than the average comment spam posted automatically to any blog site (even though my site is not a blog). So my point–still–is that feedback forms end up requiring exactly the same kinds of spam filtering that email requires.

    Given that not all such feedback forms on all my sites produce volumes of spam, I’m assuming that these spammers keep lists of forms and form field names that they use for their junk. Some of my forms are now on their lists, and some are not. They probably sell those lists to each other, too, as the volume only increases with time.

  48. This will probably work great for a while, maybe even years. But one of these days spammers are going to get smart and start using something like htmlunit (http://htmlunit.sourceforge.net/) to emulate all browser functionality (including JavaScript) and get around even the fanciest solutions.

    I think the best long-term solution will always be spam filters. We have our company’s contact email out in the open, unobfuscated, on the site. What’s stopping the spam? Google Apps and the same spam filter that powers gmail. We’ve have the address out there for over 2 years and received exactly 4 spam messages (that got to the inbox) in that time.

  49. It seems that we are doomed to be always behind the spammers, trying to find a solution for their damaging conduct. So far this kind of spam has never been a big issue for me, however, I agree with the previous comments that spam filters might be a more elegant long term solution to the problem than resorting to codes which are anyway potentially vulnerable.

  50. I’m joining the chorus: please do revise the published code to take RFC 2822 into account. Seems like ALA should be encouraging compliance with published standards, not brushing them aside.

    If you need another example of why you might want to use a plus sign in your email address, read this NYT article about a woman who had a “stillbirth at 31 weeks”:http://www.nytimes.com/2005/09/20/health/20case.html and was still getting baby-related mail a year later. Every time she got a portrait studio ad it reminded her how old her daughter should have been (“Smiling: Your 3-month-old!”). Using an address like janedoe+baby@domain.com, she could safely have registered at as many pregnancy sites as she wanted, and then if she needed to, she could shut off the baby email with one filter.

  51. I enjoyed the article and think that’s an interesting solution. However, it seems like an awful lot of work.

    For anyone interested, I came up with (yet another) JavaScript-based email obfuscater a few months back. I don’t think mine is better — in the end, all of these systems are hacks that will eventually be defeated — but I think mine is simpler and may be easier to use.

    http://pipwerks.com/journal/2007/06/13/email-address-obsfucation

    My version relies on JavaScript, but also degrades decently.

    Hopefully in the future we won’t even need to have this discussion! 🙂

  52. I couldn’t see anywhere else to submit mistakes / bug reports:

    When I try out the “demo page”:http://www.roelvangils.be/geo/demo/ with a non-JavaScript enabled browser, I keep being presented with the Turing test. Each time, I answer the question, I get a new page asking me a different question rather than the contact email address.

    I also note that in the article, the author states that for this technique to work, Apache 2 or greater is required. However the test site runs on Apache 1.3.37.

  53. @Anthony Geoghegan: that’s odd, because we’ve tested this on all major platform/browser combinations and it always worked fine (it’s just a PHP script that evaluates to true or false, so there’s not much that can go wrong). I dare to ask: you did *multiply* (and not add) the two numbers in the turing test, did you? 😉

    About the required Apache version: yup, you’re right. Apache 2 isn’t even required (I only found this out recently).

  54. Roel, you are correct to ask. For some reason, I read “sum” even though the text clearly said “product”. That’s what I get for staying on too late after work when I should be at home eating dinner.

    It’s a bit embarrassing that the first time I comment on ALA that I show myself being caught out by a (very) simple intelligence test. 🙁

    Thanks for a great article and top technique for defeating spam-bots.

  55. Until recently, I too was under the impression that bots can’t parse JavaScript to make sense of hidden addresses. But I’ve started working with Java’s JDIC WebBrowser object, and have realized how easy it would be for a Java-based bot to parse “post-processed” pages. And if it CAN be done, I’m assuming it IS.

    Bottom line: IF USERS CAN SEE THE ADDRESS, SO CAN BOTS!

  56. @C Deardorff: you’re right: bots/spiders with Java-based JavaScript parsing capabilities do exist, but I doubt if they are used for the purpose of e-mail harvesting _yet_ (because of speed issues etc.) Have you any idea about their ability to initiate events such as onclick/onmouseup and ‘see’ the DOM changes that happen as a reaction to that? Because that’s what happens with this script (the JavaScript processing of the page doesn’t just happen after loading).

    I have a collecton of (newly created) e-mail addresses that are published on various of my own webites (for over 6 months now) that are protected by this technique, and none of them seems to be harvested yet (I don’t receive any spam on thsee addresses). So, for now (!), it all works fine. Fingers crossed?

  57. There seems to be a small thread building up within this comments list about “Why not just use forms?”.

    Using a form is what we do mostly on our websites, but it is not always the solution, since Roel’s solution is aimed at enabling “real” users to copy and paste an email address for use elsewhere. Whereas forms don’t enable the end user to save the email address for later use. Some clients are OK with this, some aren’t.

    The other mention was that forms are also prone to Spam. Yes they are, but the point of the article was to prevent harvesting of email addresses, not to prevent form spam, for which there are some good solutions.

    Personally I do feel though that this is always going to be a loosing battle until the people receiving the spam actually stop reading it. The truth is that spamming makes someone a lot of money. Which means there people who act on spam messages. We will therefore always have spam and we just need to be pragmatic about it. Most spam filters are reasonably good. I certainly have few problems with spam and my email has been plastered all over the web for the last 10 years!

  58. I suppose preventing spam from being lucrative is a good idea, but curing it is much easier nowadays. Greylisting incoming email has proved itself where I work – 90% (dare I say even more than that) of all spam will filter out just by using greylisting. The rest will go through the common spam filters and won’t survive that filter. The few that do come through are filtered out by Thunderbird’s filter that I’ve trained over the months.

    I used to receive up to 600 spam mails each monday (read: after a weekend of not downloading my mail), it’s down to an average of 2 now.

  59. I’m working on a Drupal site for a non-profit organization where many folks who don’t know how to construct a mailto link will be updating the site. Drupal automatically converts email address to links, which is handy, but it doesn’t obfuscate the addresses at all. I found some code that’s similar to this solution (http://drupal.org/node/62881), but both that code and the code featured here choke when at symbols are used in unconventional ways.

    In Spanish, many words are gendered (for example, _ellos_ is the masculine form of “they” while _ellas_ is the feminine.) Some folks want to get away from this default gendering, and to do so, they replace the “a” or “o” in gendered words with “@.” Linguistic quibbles and obscurity aside, it’s something I need to accommodate on the site that I’m working on. However, both this code and the Drupal-specific code I found mangle words like ell@s, interpreting them as email addresses. I think it’s something to do with the way the regular expression is constructed, but for the life of me I can’t figure out how to make it differentiate between example@domain.com and ell@s.

  60. @Jack: You might try replacing @ with “a” or “o” and running it through a spellcheck. If it passes, it’s a word. If it fails, it’s probably an email address. You might also check for the existence of both “@” and “.” in the same word.

  61. Brian,

    I loved the idea of using TinyURL. Unfortunately, I just discovered that safari kicks out a redirect error.

    Off to try plan b.

    bob

  62. Although i love the content of this article (the technical side) but its like shooting a small bird with a Kalashnikov when u can simply throw a stone at it! A friend of mine I once asked: why don’t you do anything regarding the spam you’re getting and he goes: spam is a good way to know your email is actually working! 🙂
    The point as mentioned above, you want to kill spam? Stop reading it. As for using the email as a link, I always found that too un-friendly for users, since most probably I do not wish to launch my email browser, I just want to copy and paste later in Yahoo for example… a true useful contact is by means of a form…that said, email images are probably the best.

  63. I like to see creative people coming up with solutions to common problems like spam-n-such.

    Everybody has different needs and I think this is a fine method of helping to prevent spam on a site where you may not be able to control the recipient’s spam filter and/or adding a form isn’t an option. Different Needs.

    In re::mailto links, how about adding a tiny icon after each email that copies the address to your clipboard? I would use something like that as not to launch my e-mail client. I don’t know if this has already been thought of, although I’d have to believe it has. It’s the little things.

  64. I’m a bit disappointed that writer of article published on alistapart.com was not aware that + is absolutely allowed in email addresses. Yeah, it was good article, but every web “professional” should know the “plus in email” fact.

  65. @Jaakko Holster: firstly, I’d like you to read “this comment”:http://www.alistapart.com/comments/gracefulemailobfuscation?page=2#14.

    Secondly: sure, according to the official RFC, a plus sign is allowed in e-mail addresses, but:

    a) In reality, e-mail service providers don’t allow users to create addresses that contain a plus sign (!)

    b) I’m perfectly aware that ‘plus addressing’ is an interesting and commonly used technique (by geeks, at least) to tag/filter incoming mail and/or to backtrace where spammers got your address from, but these are not the addresses that you publish on a public website; you use plus addressing when signing up for online services, newsletters etc. Don’t forget that the plus sign is *not* a part of the actual address.

    c) However, if you insist, you *could* very easily adapt the regular expressions that the GEO technique uses, to separate the name/domein/tld in an e-mail address with something else (a ‘/’ would be a good idea).

    I hope this helps 😉

  66. Are any of the current bots smart enough to catch something like this?

    <span>address</span><span style=”display: none;”></span><span>@<a>example.com</a>

    The address is not clickable (which is actually preferable to me), but if you select it and copy it for pasting into your e-mail client, it only pulls the e-mail address.

  67. Are any of the current bots smart enough to catch something like this?

    address</span>@example.com

    The address is not clickable (which is actually preferable to me), but if you select it and copy it for pasting into your e-mail client, it only pulls the e-mail address.

  68. I do like the fact that people are always trying new methods (such as this one) yet the main disadvantages of this method as far as I can see are:
    * developer time to implement this on their site
    * nearly attempting to apply a method that is a one-size-fits-all: every site comes with a different user base.
    * falling back to a case where it is not user friendly (i.e. how many processes do users have to go through before they get the email?) in absence of Javascript
    * the + is a server setting which requires investigation by teh developer
    * simply not being able to copy/paste an email

    I’ve contacted ALA a few years ago several times, when I was about to publish an article which was a compilation of “methods to hide emails from the page source”:http://www.csarven.ca/hiding-email-addresses ; talking about the pros and cons for each method and the impact on the resource requirement to beat the spammers.

    My question to you is why did I not get any response from you and why am I reading this: http://alistapart.com/articles/gracefulemailobfuscation/ now?

    Don’t get me wrong, I do appreciate the effort that went into writing this article.

  69. Hi, just don’t forget that the “+” sign is used for Gmail filtering which may cause issues with this particular method.

    “A “+”? is typically not allowed in real e-mail addresses and it doesn’t have to be URL-encoded—which will come in handy later on.”

    e.g. email.address+ala@gmail.com

    I would suggest a “$” instead as this is an invalid email character.

    Great article nevertheless. I love your site and the great reading it provides.

  70. Sorry, i just read through the comments and i noticed you had already made mention of the “+” sign in email addresses.

  71. No server access means no way to create the non-javascript option. For these instances, and the ones where the client insists in showing their email in plain, I no longer worry about it but use an email address I don’t mind changing every 6 or 12 months (or when spam is taking over). As the email address on the website in general is not the main email address in the first place, but just a first contact one this approach seems to work quite well.

  72. I like what this script is doing, but there’s a big problem with using “window.onload” because it conflicts with any other script(s) that you’re using on a page.

  73. Actually I don’t think my previous post has to do with Drupal. I just tried installing this on an empty site and the JavaScript won’t kick in? Can anyone think of something obvious I might be missing?

  74. I’m also having a hard time with Javascript not kicking in. I’m not using a Drupal site either! I’ve tinkered for over an hour and still have no idea why it won’t kick. Even the simplest test page (no additional scripts) does not work.

    I also get a message in Firebug that the geo.js has an error: missing ; before statement
    [Break on this error] var tooltip_js_off = ‘To reveal this e-mail address, you’ll need to answer a si…

    Nice idea. I love the idea…

  75. Why, the article is great and pretty helpful! I used to get a lot of spam for my e-mail address being quite public… But I resolved my problem in much easier way – just signed up with Gafana, I don’t get spam any more. that’s it, guys.

  76. The solution described in the article is great however I think it is too much of an effort to implement. What I don’t want is to waist more time on spam scumbags then necessary.

    Spamspan [1] is a nice simple pretty clean Solution using js+css, degreading without js turned on nice and beeing easy to customize it is the solution of my choice. Especially because there ist a drupal [2] module [3] for it.

    [1] http://spamspan.com http://spamspan.de
    [2] http://drupal.org
    [3] http://drupal.org/project/spamspan

  77. This seems to be an attractive technique, but I have concerns about any added pre-processing and post-processing time. If each file has to run through this filter before it is served, and then processed by another javascript function, does this noticeably affect the page load time?

  78. Roel,
    why don’t you use your technique on your own sites, like Anysurfer.be? Is it not that accessible after all?

  79. My method of displaying email addresses is simple enough that it doesn’t require heavy scripting nor massive legwork on the user’s part.

    “If you would like to get in contact with me, you may email _gibson_ at the domain this site currently resides.”
    (I haven’t really put much time into this short line. I just wanted to get the point across.)

    I feel that displaying an email in a more cunning fashion, such as this, can accomplish a number of goals when trying to filter email.

    # *The amount of spam is greatly reduced*, of course. As far as I know, no currently effective bots can grab an email address from this text.
    # *The amount of “superfluous” messages are greatly reduce*. I feel that all other messages can go to the comments box if a user is too lazy to manually type in an email address if they would like to speak with me. (Please tell me if I’m biased or just plain wrong.)
    # *Completely cross-browser compatible*. No scripting == less browser-incompatibilities.

    This method has proven to be quite useful and has eliminated 100% of spam messages (not to mention the number unneeded ones).

  80. Excellent article. Unfortunately I have to work with an undocumented proprietary content management system written in ASP. I have come up with a simple email obfuscator based on numeric character references, JavaScript and CSS. Take a look at my blog post at http://www.pixelwisedesign.com/blog/?p=40 if you are in a similar situation and cannot utilize a server side language.

  81. For those of you wanting to use this with WordPress, amazingly there did not seem to be any implementations of quite this method as of a week ago. I knocked a rudimentary version up on “WordPress.org plugins”:http://wordpress.org/extend/plugins/graceful-email-obfuscation/ which should do the job. Contact me if anyone is interested in improvements, which I would be happy to bang in if you want to use it on a bigger site.

    I have made some slight changes to the method to hook into WP’s processing, avoiding Apache dependency, and tweaked the encoding method a little.

    For details and links to any future plans, I will update “the post on my site”:http://www.nicholaswilson.me.uk/2010/04/notes-on-good-email-obfuscation/ if I come back to this.

  82. I wrote an email obfuscation routine at http://www.php-ease.com/functions/email_link.html that does what you did, but steps it up a notch. I wrote a function that base64 encodes the entire mailto link and I put it in the title attribute. When a user hovers over it with a mouse (yeah, I know, not everyone has javascript – and for this I don’t care), THEN the title is decoded, placed into the href attribute before they click on the link, and when their mouse leaves the anchor area it then becomes obfuscated again. To step it up a notch, I also rot13 it, but that’s not included in the script I provide. This solves all of the problems of +’s, extra periods, even chinese characters.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA