brainbaking/combating-spam-with-email-obfuscation.md at 1d55b598ae873bfc5b982900af359454c84a8ecc

6.5 KiB

Raw Blame History

title

date

tags

Combating Spam With Email Obfuscation

2023-11-22T22:08:40+01:00

webdesign

spam

The articles here at Brain Baking end with a footer that contains the author bio and ways to contact me, including my preferred channel, email. Instead of including a simple <a href='mailto:blah@blieh.com' /> link, however, the email link, when clicked on, is being replaced by the actual email address with the help of a small JavaScript function. Why?

The question shouldn't be why---everyone knows why: to help keep spam bots at bay. The question instead should be how, as recent emails clearly indicate that the current method still isn't waterproof. It's not that I receive heaps of spam on a daily basis: Brain Baking simply isn't popular enough for that. Still, I'd like to quickly go over a few methods you can employ to keep bots from scraping off email addresses too easily from your website. Others like Spencer Mortensen tested these methods by setting up different email addresses and monitoring the incoming amounts of spam. Silvan Mühlemann even waited 1.5 years to see the impact of each approach!

The easiest method, applied as early as the dawn of the internet, is to simply remove a few characters or to spell out the @ symbol. Instead of putting up a link, you write: 'contact me via info at mydomain dot com'. Unfortunately, that simply doesn't work anymore. Even a very simple string matcher can still pick up the address. What you're doing here is making it difficult for your visitors to email you while also making it easy for scrapers to steal the address.

A slightly more involved approach is to resort to HTML Character Entries that will replace characters like .com with .com. You can use encoders like these to speed up the process. Again, most spam bots are clever enough to scan for encoded @ signs with common domain TLDs, so I don't think this will get you very far.

What about adding JavaScript to inject the href attribute (or perhaps the <a/> tag itself) after clicking on a link? Depending on the implementation, the success varies. I've been using a simple ROT13 replacement mechanism that's good enough. A simple click listener replaces the inner HTML of a <span/> element:

  const meel = document.querySelector('.meel');
  const enc = "<o ofwo-zopsz='aowz orrfsgg' vfst='aowzhc:kcihsf@pfowbpoywbu.qca'>kcihsf@pfowbpoywbu.qca</o>"
  meel.addEventListener('click', function() {
    meel.setAttribute('class', '')
    meel.innerHTML = enc.replace(/[a-zA-Z]/g,function(c){return String.fromCharCode((c<="Z"?90:122)>=(c=c.charCodeAt(0)+12)?c:c-26);});
  })

Remember that you'll need <noscript/> alternatives---mine simply dictates the address by relying on the visitors' knowledge of the author and the domain: "my name at this domain".

We don't really need JS and could just rely on a few CSS tricks¹ to create a "honeypot", a trap for bots where they think they extracted the correct email address while in fact you're handing out the wrong thing. A few examples:

Use display:none to append nonsense: info<span style='display: none;'>gotcha</span>@mydomain.com. Your readers won't see the gotcha but the bots (hopefully) will. You can't create a clickable mailto: link with this technique though.
Use direction overrides: span style='unicode-bidi:bidi-override; direction: rtl;'>moc.niamodym@ofni</span>. One of the more original approaches!

If you don't mind adding even more cruft to your site, you can require visitors to first fill in a captcha system such as Google's reCaptcha before revealing your contact information. Personally, if I had to go through a process like that, I wouldn't bother emailing the person.

Besides cleverly hiding your email address in plain sight in the HTML/JS, cloud-based service provides such as CloudFlare offer advanced email obfuscation methods that does not lose the advantage of simply clicking on a link, as for your visitors, nothing visible will change. Of course, this requires your website to be served through CloudFlare.

You can also expect your contacts to only send encrypted email and simply dump all non-encrypted ones into your spam folder. But the email protocol was never meant to be encrypted, so for your super-secret messages you should stop using encrypted email and resort to other means of messaging instead, such as Signal.

Others like Kev Quirk rely on disposable subdomains or temporary email addresses with forwards that can also be handed out when filling in a form, negating the need to obfuscate anything in HTML.

The problem worsens when you take my "Reply Via Email" <a/> mailto link into account that is embedded in RSS <description/> tags. There is no way of knowing which RSS client is able to parse the above CSS or JS tricks, and to ensure maximum compatibility, there shouldn't be any obfuscation at all. Since RSS is made to be parsed easily, I've had more spam entered through this way than through the site itself. Semi-hiding the address by spelling it out loses the ease of use of a simple reply button.

I even forgot that I wired my RSS feed to my GoodReads profile, meaning all of a sudden my email address is embedded into a popular Amazon-owned external website---whoops. The easiest solution is not to have such a link in your feed but I do get a few emails that way so currently it's worth the occasional spam that will put SpamSieve to work anyway. If the shit hits the fan, I can still delete the "hifromrss" email alias.

Most spam I receive unfortunately doesn't come from bots but from pushy people thinking I'm in dire need of SEO help and willing to pay large sums of money for it, frequently reminding me that I didn't respond to their initial generous request. To those "manual spammers", I have only one thing to say: bugger off.

Do be aware that some aggressive forms of CSS obfuscation can break the accessibility of your website! Come to think of it, that includes the captcha approach. ↩︎

6.5 KiB Raw Blame History

6.5 KiB

Raw Blame History