Win the SPAM Arms Race

by Dan Benjamin

78 Reader Comments

Back to the Article
  1. I dunno, I like my little obfuscator
    http://www.healyourchurchwebsite.com/obfuscator/

    the mailto: disappears as plain text, the @ dissapears as main text, there is a random mixture of ascii; and hex; encodings that appearently have worked well enough of the past few years at a church site I’ve developed that our spam count is very low – usually guys clicking on the address and letting it rip.

    That said, I am GLAD there is more than one way to skin this cat. If there were 1 best way, you can bet your bottom dollar the spammers would be coding up a storm to get past it – instead, by us having several anti-spam tools out there, the targets are too numerous to overcome – like a grasshopper taking on a bunch of little ants (we’re the guys in black !-)

    Copy & paste the code below to embed this comment.
  2. Of course, you can always make your email address as plain as day, and let live this little beauty:
    http://www.perlmonks.org/index.pl?node_id=103656

    On a more serious note, if encoding efforts happen to fail (or you just couldn’t resist signing up for that “free” “toothbrush”) there exists Spam Assassin (http://www.spamassassin.org), whose extremely powerful perl-driven parser can filter out more than 90% of spam.

    Copy & paste the code below to embed this comment.
  3. What you have to remember is that spambots aren’t people reading your page; they’re an automated robot. They don’t SCAN the page for email addresses, they PARSE it. That means they look through the source code for email addresses, not the text; depending on the parser, an address in a meta tag is no different than one in a link. It all depends how the spambot does the parsing. For instance, it may just look at links in the page:

    #!/usr/bin/perl -w
    use strict;
    use HTML::TokeParser;
    use LWP::Simple;

    sub grab_email_using_links
    {
    my ($url) = @_;
    my $content = get (“http://someurl”);
    my $parse = HTML::TokeParser->new(\$content);

    my addresses;<br /> while (my $token = $parse-&gt;get_tag(&#8220;a&#8221;))<br /> {<br /> my $url = $token-&gt;[1]{href} || &#8220;&#8221;; <br /> my $text = $parse-&gt;get_trimmed_text(&#8221;/a&#8221;);<br /> my ($email) = $url =~ /mailto:(.*?)/<br /> push (addresses, $1) if ($email =~ /([@]+@[.]+\..*?)/);
    push (addresses, $1) if ($text =~ /([^]+@[^.]+\..*?)/);
    }
    return @addresses;
    }

    However, what if he just looks at the source code in general, hoping to pick some out of the body text? The address finding is then greatly simplified:

    #!/usr/bin/perl -w
    use strict;
    use LWP::Simple;

    sub get_email_from_page
    {
    my ($url) = _;<br /> my $content = get (&#8220;http://someurl&#8221;);<br /> my addresses = $content =~ /([@]+@[.]+\..*?)/g;
    return @addresses;
    }

    Even if you don’t know perl, you can at least realize that the above is most definately not a lot of code; if a spammer REALLY wanted your address, I’m sure he’d be able to resole many of the solutions you are posting…

    As you can see, if your are serious about protecting yourself, its pretty stupid to post your email address in any shape or form (of course, spambots don’t tend to go after personal sites; as such I wouldn’t be too worried about posting your email address on your homepage). A form based mailer is much safer, and pretty much a necessity for larger sites. The best (meaning most secure and featureful) is the NMS (http://nms-cgi.sourceforge.net) formmail (and the brand new TFMail, formmail’s big brother). To answer the argument that form-based mailers don’t give the mailer a copy of the email, formmail can be configured to email the mailer a copy of the email, or to simply display the message to the screen (in which its their own damn fault if they don’t save it! :P)

    Copy & paste the code below to embed this comment.
  4. Just for fun, I wrote a page of JavaScript to change

    name@example.com

    to

    [removed]s=”?`.=lnb/dmql`ydAdl`o?#dondlnr!mh`ld#<dmuhu#lnb/dmql`ydAdl`o;numh`l#<gdsi!`=”;for(i=s.length;i;i—)[removed](String.fromCharCode(1^s.charCodeAt(i-1)))[removed]

    The relevant JavaScript functions on the generation page were

    // Return a string that’s reversed with bit 0 flipped
    function Flip(s)
    {
    var i;
    var r = ‘’;
    for (i = s.length; i; i—) {
    r += String.fromCharCode(s.charCodeAt(i – 1) ^ 1);
    }

    return r;
    }

    // Update the form
    function Refresh()
    {
    var s = ‘<a title=”’ + document.all.LinkHover.value + ‘“>’ + document.all.LinkText.value + ‘</a>’;
    document.all.ResultPlain.value = s;
    document.all.ResultObfuscated.value = ‘[removed]s=”’ +
    Flip(s).replace(’\\\\’, ‘\\\\\\\\’).replace(’”’, ‘\\”’).replace(”’”, “\\’”) +
    ‘“;for(i=s.length;i;i—)[removed](String.fromCharCode(1^s.charCodeAt(i-1)))[removed]’;
    }

    As mentioned before, this works until the spambots figure out how to run JavaScript.

    Copy & paste the code below to embed this comment.
  5. Spam is one of the main factors that is holding back the potential of the internet. Hotmail accounts are generally where most people begin their experiences with email, and considering the amount of crap that gets through to your average hotmail account, people just aren’t going to take it seriously. Fortunately there are moves under way to make spamming illegal in parts of Europe and then hopefully the rest of the world will follow suit. I can’t imagine how spam can be an effective way of marketing when everyone sees it as an infrigement on their privacy ???

    Copy & paste the code below to embed this comment.
  6. I’ve had great success just listing my email like this:

    mailto:name@domain.com

    Have yet to get spammed. Quite nice actually. Quite simple too. The best protection is, of course, a form submitted to the server.

    Copy & paste the code below to embed this comment.
  7. Hi all,

    here we go with the easiest fix ever:

    instead of name@domain.com use the following:
    name@domain.com

    It doesnt need any javascript (not everyone browses with javascript ON!!!) and still is fully functional. This is to help standards compliance and the new way of supplying information through the web. I got it from (personal comment by David at) http://stilleye.com

    Ciao.

    Copy & paste the code below to embed this comment.
  8. i promise i hadnt read above before posting ;-)

    Copy & paste the code below to embed this comment.
  9. I think I stated back on page 2 of this discussion, that anyone handy with LWP and HTML::Entities and could snarf up e-mail addresses that were hex encoded only. This is why I randomize between hex & ascii encoding. Not foolproof by any means, but requires enough extra coding to get overlooked by all but the most determined ‘bots … and for them, I have some SSI induced chaff in their way so by the time they get to a real e-mail address, it’s lost in the white noise or they’ve given up.

    But like I said in my last post, I hope EVERYONE here implements a variety of tactics. If we all did the same thing, they’d easily code at that one solution Buy offering a milieu of methods, we keep’m spinning their wheels and hopefully driving up their cost of doing business.

    Copy & paste the code below to embed this comment.
  10. Anyone tried out Cloudmark (http://www.cloudmark.com) ? I’ve just signed up. Not server-side, but it is a step in the right direction.

    Copy & paste the code below to embed this comment.
  11. I am trying out the hivelogic method displayed.

    However the displayed name is so coloured that it does not suit my dark background. Could some show me the exact coding and where it should be in the coding generated so I can have a choice of colour. I know nothing about coding.

    Many thanks

    Robin

    Copy & paste the code below to embed this comment.
  12. I created a database with MySQL/PHP that stores the email addresses, which are never viewable(even through viewing the source) on the website. Am I still at risk?

    I’m also in the process of configuring my email server to block everyone in the Open Relays database (http://ordb.org). Also, I’m using a tool called Mailscanner(mailscanner.info) which scans all of the emails for viruses and it has a SpamAssassin plugin.

    Copy & paste the code below to embed this comment.
  13. As noted above any technique that uses client side javascript is useless, the end user can turn it off.

    Thats why you use an .asp or another server side solution if you want to foil spambots.

    Copy & paste the code below to embed this comment.
  14. I agree that the reliance on client-side javascript is a problem; however, it’s possible to get around the problem using something like this after where you’ve embedded the [removed]

    <noscript>
    sk inthesoup. org
    (How do I use this address?)
    </noscript>

    And just make How do I use this address? text a link to a page containing instructions (using a dummy email, of course!).

    Copy & paste the code below to embed this comment.
  15. I see all this 20+ lines of code just to hide email addresses from html…it´s so simple to just publish the damn thing in .swf format and stick it in your page

    Copy & paste the code below to embed this comment.
  16. The Way

    “The Way is shaped by use,
    But then the shape is lost.
    Do not hold fast to shapes
    But let sensation flow into the world
    As a river courses down to the sea.”
    Tao Te Ching; 32 Shapes

    I know how to do this in php but you could use anything that makes this possible.

    When the client clicks on an email link, a box pops up asking them to enter their email address and then the site emails them the address so all they have to do is reply to that email.

    Easy.

    all data is stored in a MySQL database which is passworded so only the php on the server can access it.

    It also cuts out any display on the web of either email address. Thus bypassing the spam issue.

    Copy & paste the code below to embed this comment.
  17. Another solution, used by BeSweet‘s author, is to simply replace an email link with a link to a forum page where one can leave a message on the system for the user. In my case I have forums on my site and private messages (phpnuke), so I can do that for myself too. I just did that today – what a coincidence.

    Copy & paste the code below to embed this comment.
  18. simplify the javascript code to:

    email me

    Copy & paste the code below to embed this comment.
  19. Here’s the weakness, and a suggestion:

    IT SEEMS TOO EASY to write a script that will harvest any consistently applied technique of obscuring mailto addresses. The trick is for us all to USE SOMETHING DIFFERENT. Use SSI to create throw-away e-mail addresses from some part of the user’s IP address, or use PHP to use the time of day. Mix this up with the break-apart technique, but don’t break the address at logical places. Throw in a little encoding, here and there. Keep the harvesters on their toes! Make it easier to get their addresses from other sites. The hard work they’ll go to, just for a handful of our obscured addresses, won’t be worth it.

    In general I use and recommend the “caller ID” method, creating a custom e-mail address for myself each time I register for web sites, etc. I know from whom the mail came, by the address they sent it to, and can easily filter it out. Example: ebay.me@mydomain.com

    I’m also using SpamAssassin, which is great by itself. I learned how to write a user_prefs file, and how to write simple procmail recipes. Together, it’s really, really effective. Because we can’t stop’em all from getting our e-mail addresses.

    Copy & paste the code below to embed this comment.
  20. Blocking those nasty spambots is a real pain I agree. I have seen bots that go through the trouble of parsing email addresses after processing the web page through a browser. So how do we kill it in my shop and get even?

    The Block: We have email sent to us through form submission. IP addresses are logged with the date to prevent abuse (only 3 messages an hour). This is a easy script to write. You do not have to show the address in a .cgi file

    The Kill: Knowing what these bots look for, and how they operate is key. They spider hyperlinks over your entire site where there will be likely email address. We have a mailto:bsadress@bsdomain.com generator, which is hyperlinked and named directory.cgi to our contacts section.
    Imagine if the spammer was caught in the mail out by getting tens of thousands of mailerdaemons. Their server admin would catch them before the complaints.

    The link provided isn’t my site, just one that i found that had these scripts for free/practically nothing.

    Granted This article was intended to have this be done in Javascript, and in that case consider calling the function as an external .js file to generate the email addresses and to display email links. BTW if you feel you have to display your email adress that they click on use an image that is linked the way Xavier Defrang mentioned above if you have no cgi access.

    Whew I think I have said Enough,

    David Smith

    Copy & paste the code below to embed this comment.
  21. If you use cgi to provide web based email through forms, you do not have to display your email address. I do that in my shop and it works.

    If you generate bogus random addresses like bs@bs.com you will tip them off to the sys admin when they begin a mailing getting thousands of “address unknowns”, or wasting their bandwidth if they are a service.

    If you must display an email address, make an image that uses the method Xavier Defrang mentions above to link it, with the function as an external .js script.

    The link is not my site, just a place i found where you can get these simple scripts if you are too lazy to write them.

    Avitar

    Copy & paste the code below to embed this comment.
  22. דגכעדשגכשדגכדגכדגכ

    Copy & paste the code below to embed this comment.
  23. Check out spamgourmet.com
    It works the same as sneakEmail.com

    Copy & paste the code below to embed this comment.
  24. http://www.neilgunton.com/spambot_trap/

    Just search for “Balu” to find my php-solution, that generates a uniq mailto: for each visitor – which looks like

    web-32bitIP.timestamp@example.com

    This way I can easily reject addresses that were found by bots and are used for SPAMming. I even know where the bot came from and when. I can even find them in the webserver-logfiles and analyze their activity.

    There are many other ideas and hints on that page too…

    Balu
    Copy & paste the code below to embed this comment.
  25. Just use a contact form that then mails you the content of the form. The user doesn’t need to know your e-mail address at all.
    I use one at my site and users find it easy to use.
    Cheers

    Copy & paste the code below to embed this comment.
  26. By far the most elegant approach AFAIK;
    Catch the email harvester in a tar pit and destroy it’s database.
    (including the email address just snatched from your HTML)

    All mail links can remain unmodified.

    http://www.monkeys.com/wpoison/

    Copy & paste the code below to embed this comment.
  27. The easiest and as far as I know most fool-proof method is one that Xavier almost alluded to. It involves a simple Javascript function that assembles an email address when the user clicks a hyperlink.

    function mail(user) {
    locationstring = “mailto:” + user + “@” + “domain.com”;
    [removed] = locationstring;
    }

    You can of course add more variables so that you may use multiple domains, ie:

    function mail(user,dom,tld) {
    locationstring = “mailto:” + user + “@” + dom + “.” + tld;
    [removed] = locationstring;
    }

    In the hyperlink, just call the function:

    [removed] mail(‘johndoe’,‘domain’,‘com’);

    This method has proved exceptionally reliable. To test it, I put a page with a normal email link to spam@mydomain.com and one to my real address assembled through this Javascript function. Spam comes to the spam address, but not to my real address.

    Hopefully this helps someone!

    Copy & paste the code below to embed this comment.
  28. In pure HTML you can extra tags that will not get in the way.
    You can also use the character entities.
    JavaScript would be needed to dynamically add this to the HREF of the mailto link.
    You can even distribute the parts of you email address in invisible tags around the page, even put some bits in attributes, and use JavaScript to reconstruct it in.

    Problem with non-javascript browsers though.

    Maybe have <? include my_email.txt ?> in the mailto href and on the page for a server side solution. But don’t bots get to the page after sever-side processing?

    Copy & paste the code below to embed this comment.