Win the SPAM Arms Race

by Dan BenjaminMay 24, 2002

Published in HTML, JavaScript, The Server Side

Most seasoned web designer/developers have learned that posting an email address on a website is a sure-fire way to guarantee a steaming pile of spam delivered to that address for years to come.

Article Continues Below

Indeed, posting a naked email link anywhere on the web (or in a newsgroup, in a chatroom, on a weblog comments page …) is generally the kiss of death for your once-healthy address.

INVASION OF THE SPAMBOTS#section2

It begins innocently enough: the neophyte web developer codes his address into a fresh, new web page to solicit the feedback of his adoring fans. “Email me!” it beckons. A short time passes. Then the barrage of email begins.

From:   john@48_93839aac6673030.com
Subject: Make 80k working from home ...

The first few spam emails seem entertaining. Then frustrating. Following the so-called “unsubscribe” links in each mail only results in more mail. Eventually, the task of separating valid email from junk becomes so time consuming and problematic that the developer is forced to abandon the email address entirely.

I know this because the developer was me – and to this day, years later, that email address still receives dozens of unwanted email messages every day.

Have you ever wondered how, almost instantly, your email address is discovered, recorded, handed down, and passed around? This happens due to hordes of Email Harvesting Robots (aka spambots). These autonomous bots spider the web day and night in waves, crawling pages and following links until they discover an unsuspecting MAILTO tag. Then they pounce, devouring the address and sending it deep into the bowels of the web where the ugly, festering spam companies dwell.

TENTATIVE MEASURES DON’T HELP#section3

Having learned your lesson, unwilling to post your naked address on a page again, you attempt to buck the system, removing any possibility that the spambots will detect your address. Instead of providing an actual link, you type:

you at example dot com

While this will fool ’most all spambots, it fails entirely at providing your audience with an actual, clickable, email link. Your viewers will have to read and remember the text and then type it manually into their email program. Not a big deal, but this might be the extra step that prevents them from making contact with you.

Your goal is to make their life easier, not harder; to encourage, not discourage, contact. So you abandon this technique and move on to something better. But what?

TO EMBED OR NOT TO EMBED#section4

Your next move might be to embed the MAILTO link within a block of JavaScript, like this:

<script type="text/javascript">
document.write("email me");
</script>

At first, this would appear to be a suitable solution. Unlike web browsers which will execute the JavaScript and display your link correctly, spambots should, in theory, ignore the JavaScript and therefore the link to your email address.

Unfortunately, spambots can read your address without executing any code at all. This is because the link exists as plain-text within the parseable text of the page, ripe for the picking.

Drew McLellan was happy enough to prove this to us all with his Encoded email address harvester, a nice device that harmlessly scans your web page in search of vulnerable addresses. Run it against your existing web pages and see what it finds. The results should prove interesting.

DOUBLE PROTECTION: AS GOOD AS IT GETS#section5

The best solution builds on the JavaScript-wrapping idea, but rather than leave the naked email address exposed within the easily-parsed JavaScript, the address is encoded – translated – into each character’s Numerical Equivalent, and then wrapped.

You see, every character can be mapped to its own special code which, when viewed by a web browser, is translated back into that character. The following table gives you an idea of how this works:

char  code
a  a
b  b
c  c
1  1
2  2
3  3

Using this technique, you can generate a complete link that will be rendered correctly by the web browser. Wrapping these codes yet again within a JavaScript will further limit the chances your email address will be assimilated.

NO ONE IS SAFE#section6

Unfortunately, no email address published online is entirely safe from robots that harvest addresses, but converting your email address to numerical equivalents and then wrapping the result in JavaScript should foil all but the smartest and most dedicated spambots.

AUTOMATING ANTI-SPAM PROTECTION#section7

Converting your email address into an encoded and JavaScript-wrapped link is a tedious process. Fortunately, almost every programming language features an easy way to invoke this transformation.

This author’s Automatic Labs Email Address Encoder, for example, is an online resource that will handle this process for you. Just enter your email address and the link information, and it will do the rest. You may also download a stand-alone program to run on your own computer. Both versions create the same results, and even allow you to specify cross-browser XHTML or HTML 4.01 code that validates.

On this code page, you’ll find a simple PHP function that will handle this transformation.Enjoy! {The product has been updated since this article was published in ALA issue 145. To download the latest version, see the Automatic Labs products page. – Ed.}

78 Reader Comments

Dean Peters says:

June 14, 2002 at 7:54 pm

I dunno, I like my little obfuscator
http://www.healyourchurchwebsite.com/obfuscator/

the mailto: disappears as plain text, the @ dissapears as main text, there is a random mixture of ascii; and hex; encodings that appearently have worked well enough of the past few years at a church site I’ve developed that our spam count is very low – usually guys clicking on the address and letting it rip.

That said, I am GLAD there is more than one way to skin this cat. If there were 1 best way, you can bet your bottom dollar the spammers would be coding up a storm to get past it – instead, by us having several anti-spam tools out there, the targets are too numerous to overcome – like a grasshopper taking on a bunch of little ants (we’re the guys in black !-)
Joseph F. Ryan says:

June 18, 2002 at 9:45 pm

Of course, you can always make your email address as plain as day, and let live this little beauty:
http://www.perlmonks.org/index.pl?node_id=103656

On a more serious note, if encoding efforts happen to fail (or you just couldn’t resist signing up for that “free” “toothbrush”) there exists Spam Assassin (http://www.spamassassin.org), whose extremely powerful perl-driven parser can filter out more than 90% of spam.
Joseph F. Ryan says:

June 18, 2002 at 10:16 pm

What you have to remember is that spambots aren’t people reading your page; they’re an automated robot. They don’t SCAN the page for email addresses, they PARSE it. That means they look through the source code for email addresses, not the text; depending on the parser, an address in a meta tag is no different than one in a link. It all depends how the spambot does the parsing. For instance, it may just look at links in the page:

#!/usr/bin/perl -w
use strict;
use HTML::TokeParser;
use LWP::Simple;

sub grab_email_using_links
{
my ($url) = @_;
my $content = get (“http://someurl”);
my $parse = HTML::TokeParser->new($content);

my @addresses;
while (my $token = $parse->get_tag(“a”))
{
my $url = $token->[1]{href} || “”;
my $text = $parse->get_trimmed_text(“/a”);
my ($email) = $url =~ /mailto:(.*?)/
push (@addresses, $1) if ($email =~ /([^@]+@[^.]+..*?)/);
push (@addresses, $1) if ($text =~ /([^@]+@[^.]+..*?)/);
}
return @addresses;
}

However, what if he just looks at the source code in general, hoping to pick some out of the body text? The address finding is then greatly simplified:

#!/usr/bin/perl -w
use strict;
use LWP::Simple;

sub get_email_from_page
{
my ($url) = @_;
my $content = get (“http://someurl”);
my @addresses = $content =~ /([^@]+@[^.]+..*?)/g;
return @addresses;
}

Even if you don’t know perl, you can at least realize that the above is most definately not a lot of code; if a spammer REALLY wanted your address, I’m sure he’d be able to resole many of the solutions you are posting…

As you can see, if your are serious about protecting yourself, its pretty stupid to post your email address in any shape or form (of course, spambots don’t tend to go after personal sites; as such I wouldn’t be too worried about posting your email address on your homepage). A form based mailer is much safer, and pretty much a necessity for larger sites. The best (meaning most secure and featureful) is the NMS (http://nms-cgi.sourceforge.net) formmail (and the brand new TFMail, formmail’s big brother). To answer the argument that form-based mailers don’t give the mailer a copy of the email, formmail can be configured to email the mailer a copy of the email, or to simply display the message to the screen (in which its their own damn fault if they don’t save it! :P)
Anthony J. Mills says:

June 20, 2002 at 12:20 pm

Just for fun, I wrote a page of JavaScript to change

name@example.com

to

The relevant JavaScript functions on the generation page were

// Return a string that's reversed with bit 0 flipped
function Flip(s)
{
var i;
var r = '';
for (i = s.length; i; i--) {
r += String.fromCharCode(s.charCodeAt(i - 1) ^ 1);
}

return r;
}

// Update the form
function Refresh()
{
var s = '' + document.all.LinkText.value + '';
document.all.ResultPlain.value = s;
document.all.ResultObfuscated.value = '';
}

As mentioned before, this works until the spambots figure out how to run JavaScript.
Stephen says:

June 20, 2002 at 11:13 pm

Spam is one of the main factors that is holding back the potential of the internet. Hotmail accounts are generally where most people begin their experiences with email, and considering the amount of crap that gets through to your average hotmail account, people just aren’t going to take it seriously. Fortunately there are moves under way to make spamming illegal in parts of Europe and then hopefully the rest of the world will follow suit. I can’t imagine how spam can be an effective way of marketing when everyone sees it as an infrigement on their privacy ???
David says:

June 21, 2002 at 2:29 am

I’ve had great success just listing my email like this:

mailto:name%40domain.com

Have yet to get spammed. Quite nice actually. Quite simple too. The best protection is, of course, a form submitted to the server.
Sergi says:

June 21, 2002 at 6:50 am

Hi all,

here we go with the easiest fix ever:

instead of name@domain.com use the following:
name%40domain.com

It doesnt need any javascript (not everyone browses with javascript ON!!!) and still is fully functional. This is to help standards compliance and the new way of supplying information through the web. I got it from (personal comment by David at) http://stilleye.com

Ciao.
Sergi says:

June 21, 2002 at 6:51 am

i promise i hadnt read above before posting 😉
Dean Peters says:

June 21, 2002 at 8:48 am

I think I stated back on page 2 of this discussion, that anyone handy with LWP and HTML::Entities and could snarf up e-mail addresses that were hex encoded only. This is why I randomize between hex & ascii encoding. Not foolproof by any means, but requires enough extra coding to get overlooked by all but the most determined ‘bots … and for them, I have some SSI induced chaff in their way so by the time they get to a real e-mail address, it’s lost in the white noise or they’ve given up.

But like I said in my last post, I hope EVERYONE here implements a variety of tactics. If we all did the same thing, they’d easily code at that one solution Buy offering a milieu of methods, we keep’m spinning their wheels and hopefully driving up their cost of doing business.
Paul Neave says:

June 26, 2002 at 5:08 am

Anyone tried out Cloudmark (http://www.cloudmark.com) ? I’ve just signed up. Not server-side, but it is a step in the right direction.
Robin says:

June 28, 2002 at 11:58 pm

I am trying out the hivelogic method displayed.

However the displayed name is so coloured that it does not suit my dark background. Could some show me the exact coding and where it should be in the coding generated so I can have a choice of colour. I know nothing about coding.

Many thanks

Robin
Brett says:

July 16, 2002 at 5:11 pm

I created a database with MySQL/PHP that stores the email addresses, which are never viewable(even through viewing the source) on the website. Am I still at risk?

I’m also in the process of configuring my email server to block everyone in the Open Relays database (http://ordb.org). Also, I’m using a tool called Mailscanner(mailscanner.info) which scans all of the emails for viruses and it has a SpamAssassin plugin.
Anonymous says:

July 17, 2002 at 3:45 pm

As noted above any technique that uses client side java script is useless, the end user can turn it off.

Thats why you use an .asp or another server side solution if you want to foil spambots.
Sarah Knudsen says:

August 13, 2002 at 7:45 am

I agree that the reliance on client-side javascript is a problem; however, it’s possible to get around the problem using something like this after where you’ve embedded the javascript:

And just make How do I use this address? text a link to a page containing instructions (using a dummy email, of course!).
thomas says:

August 24, 2002 at 1:34 pm

I see all this 20+ lines of code just to hide email addresses from html…itÂ´s so simple to just publish the damn thing in .swf format and stick it in your page
T-Dub says:

September 2, 2002 at 10:52 am

The Way

“The Way is shaped by use,
But then the shape is lost.
Do not hold fast to shapes
But let sensation flow into the world
As a river courses down to the sea.”
Tao Te Ching; 32 Shapes

I know how to do this in php but you could use anything that makes this possible.

When the client clicks on an email link, a box pops up asking them to enter their email address and then the site emails them the address so all they have to do is reply to that email.

Easy.

all data is stored in a MySQL database which is passworded so only the php on the server can access it.

It also cuts out any display on the web of either email address. Thus bypassing the spam issue.
Wolverine says:

September 14, 2002 at 12:31 pm

Another solution, used by BeSweet‘s author, is to simply replace an email link with a link to a forum page where one can leave a message on the system for the user. In my case I have forums on my site and private messages (phpnuke), so I can do that for myself too. I just did that today – what a coincidence.
Lars at the beach says:

October 14, 2002 at 4:02 am

simplify the javascript code to:

email me
Denis in Seattle says:

January 8, 2003 at 11:26 pm

Here’s the weakness, and a suggestion:

IT SEEMS TOO EASY to write a script that will harvest any consistently applied technique of obscuring mailto addresses. The trick is for us all to USE SOMETHING DIFFERENT. Use SSI to create throw-away e-mail addresses from some part of the user’s IP address, or use PHP to use the time of day. Mix this up with the break-apart technique, but don’t break the address at logical places. Throw in a little encoding, here and there. Keep the harvesters on their toes! Make it easier to get their addresses from other sites. The hard work they’ll go to, just for a handful of our obscured addresses, won’t be worth it.

In general I use and recommend the “caller ID” method, creating a custom e-mail address for myself each time I register for web sites, etc. I know from whom the mail came, by the address they sent it to, and can easily filter it out. Example: ebay.me@mydomain.com

I’m also using SpamAssassin, which is great by itself. I learned how to write a user_prefs file, and how to write simple procmail recipes. Together, it’s really, really effective. Because we can’t stop’em all from getting our e-mail addresses.
David Smith says:

January 21, 2003 at 11:50 pm

Blocking those nasty spambots is a real pain I agree. I have seen bots that go through the trouble of parsing email addresses after processing the web page through a browser. So how do we kill it in my shop and get even?

The Block: We have email sent to us through form submission. IP addresses are logged with the date to prevent abuse (only 3 messages an hour). This is a easy script to write. You do not have to show the address in a .cgi file

The Kill: Knowing what these bots look for, and how they operate is key. They spider hyperlinks over your entire site where there will be likely email address. We have a mailto:bsadress@bsdomain.com generator, which is hyperlinked and named directory.cgi to our contacts section.
Imagine if the spammer was caught in the mail out by getting tens of thousands of mailerdaemons. Their server admin would catch them before the complaints.

The link provided isn’t my site, just one that i found that had these scripts for free/practically nothing.

Granted This article was intended to have this be done in Javascript, and in that case consider calling the function as an external .js file to generate the email addresses and to display email links. BTW if you feel you have to display your email adress that they click on use an image that is linked the way Xavier Defrang mentioned above if you have no cgi access.

Whew I think I have said Enough,

David Smith
Avitar says:

January 22, 2003 at 12:00 am

If you use cgi to provide web based email through forms, you do not have to display your email address. I do that in my shop and it works.

If you generate bogus random addresses like bs@bs.com you will tip them off to the sys admin when they begin a mailing getting thousands of “address unknowns”, or wasting their bandwidth if they are a service.

If you must display an email address, make an image that uses the method Xavier Defrang mentions above to link it, with the function as an external .js script.

The link is not my site, just a place i found where you can get these simple scripts if you are too lazy to write them.

Avitar
Tim the Logokleptomaniac says:

February 19, 2003 at 4:17 pm

Check out spamgourmet.com
It works the same as sneakEmail.com
Balu says:

March 3, 2003 at 3:51 pm

http://www.neilgunton.com/spambot_trap/

Just search for “Balu” to find my php-solution, that generates a uniq mailto: for each visitor – which looks like

web-32bitIP.timestamp@example.com

This way I can easily reject addresses that were found by bots and are used for SPAMming. I even know where the bot came from and when. I can even find them in the webserver-logfiles and analyze their activity.

There are many other ideas and hints on that page too…

Balu
Alex says:

July 26, 2003 at 10:38 am

Just use a contact form that then mails you the content of the form. The user doesn’t need to know your e-mail address at all.
I use one at my site and users find it easy to use.
Cheers
Marek Moehling says:

September 12, 2003 at 10:50 am

By far the most elegant approach AFAIK;
Catch the email harvester in a tar pit and destroy it’s database.
(including the email address just snatched from your HTML)

All mail links can remain unmodified.

http://www.monkeys.com/wpoison/
lithis says:

October 5, 2003 at 11:02 am

The easiest and as far as I know most fool-proof method is one that Xavier almost alluded to. It involves a simple Javascript function that assembles an email address when the user clicks a hyperlink.

function mail(user) {
locationstring = “mailto:” + user + “@” + “domain.com”;
window.location = locationstring;
}

You can of course add more variables so that you may use multiple domains, ie:

function mail(user,dom,tld) {
locationstring = “mailto:” + user + “@” + dom + “.” + tld;
window.location = locationstring;
}

In the hyperlink, just call the function:

javascript: mail(‘johndoe’,’domain’,’com’);

This method has proved exceptionally reliable. To test it, I put a page with a normal email link to spam@mydomain.com and one to my real address assembled through this Javascript function. Spam comes to the spam address, but not to my real address.

Hopefully this helps someone!
LJ says:

October 18, 2003 at 11:23 am

In pure HTML you can extra tags that will not get in the way.
You can also use the character entities.
JavaScript would be needed to dynamically add this to the HREF of the mailto link.
You can even distribute the parts of you email address in invisible tags around the page, even put some bits in attributes, and use JavaScript to reconstruct it in.

Problem with non-javascript browsers though.

Maybe have in the mailto href and on the page for a server side solution. But don’t bots get to the page after sever-side processing?