brainbaking/content/post/2022/04/fighting-webmention-and-pin...

6.5 KiB

title date tags categories
Fighting Webmention And Pingback Spam 2022-04-24T19:38:00+02:00
indieweb
webmentions
pingbacks
webdesign

My websites use both the modern Webmention and conventional Pingback systems to communicate with other websites. I've written about implementing your own Webmention server and my first IndieWeb experience before (hey, that was about a year ago!), so I won't cover the basics here. I still stand by my decision to support Pingbacks, which is much more popular (Wordpress owns half of the internet).

And herein lies the pickle. More popular obviously also means more prone to spam. IndieWeb enthusiasts are quick to dismiss a working concept and invent their own (instead of posting an XML, you're posting a form. Great progress guys!), also "because Pingback comes with a lot of spam". Unfortunately, I'm here to report that Webmention endpoints also receive their share of shit. That's why I spent all weekend redesigning my go-jamming Jamstack server to better combat spam.

Go-Jamming converts a Pingback to a Webmention, so internally, they're the same and adhere to the same rules. Once the Pingback XML manages to get decoded, that is---most of the spam I see coming my way comes in the form of malformed XML! Hilarious. Here's the log output of a failing one:

Apr 18 22:47:28 vps-domain go-jamming[1523793]: {"level":"error","error":"pingback POST: Unable to unmarshal XMLRPC <?xml version=\"1.0\" encoding=\"utf-16\" standalone=\"yes\"?>\r\n<methodCall>\r\n\t<methodName>pingback.ping</methodName>\r\n\t<params>\r\n\t\t<param>\r\n\t\t\t<value><string>https://aylesbur-drains dot couk</string></value>\r\n\t\t</param>\r\n\t\t<param>\r\n\t\t\t<value><string>https://brainbaking.com/post/2022/03/an-ad-leaflet-qr-design-mistake/</string></value>\r\n\t\t</param>\r\n\t</params>\r\n</methodCall>: xml: encoding \"utf-16\" declared but Decoder.CharsetReader is nil","time":1650322048,"message":"Pingback receive went wrong"}

That British -drains domain name comes in many forms with different prepended words, but they're all the same. A -drains blacklist entry helps here. I've obfuscated a part of the domain name as I don't want to give spammers the pleasure of linking even if it's implicit

Even if the encoding would be correct, it'll probably fail the following check: if you mention this site on your page, Go-Jamming double-checks that. It parses your HTML (the source) to see if there's an href to my site (the target). If not, it simply aborts the process and bins the attempt. For Webmentions specifically, Go-Jamming also checks the form encoding and the presence of the correct form attributes. Also, if a source comes from a blacklisted domain, the fun ends here.

Next, if there isn't any microformat data, and Go-Jamming can't fill in the blanks (Pingbacks usually do not come from IndieWeb-powered sites, so I still have a parseBodyAsNonIndiewebSite func implemented), the process again stops. "But what about brute-force attempts?" Good thinking: Go-Jamming comes with helmet on---security-related HTTP headers and built-in rate limiting.

Still, some spammers manage to make it all the way through. A new trend I've discovered lately is AI-powered writings: deploy an actual Wordpress site, let the computer generate an article that looks like it was written by someone (e.g. that doesn't merely repeatedly contain "buy now"), secretly paste in the victim's URL, and fire off those Pingbacks. Since the URL is valid and the XML is well-formed, it passes the tests.

So now what? Since my Hugo-powered sites automatically download the latest webmentions, and those were valid, we end up with unwanted comments on our live site. My manual blacklist system wasn't up to the job. Instead, I implemented proper whitelist and blacklist systems not unlike the "in moderation" commenting system in Wordpress:

  1. A seemingly OK mention is received. Add it to the moderation queue (a separate database that doesn't pollute the validated ones);
  2. A notification (email) is sent out with a link to either approve or reject the mention from the unknown domain;
  3. If the mention is rejected, it is deleted forever and the domain is added to the blacklist;
  4. If the mention is accepted, it is moved to the validated database and the domain is added to the whitelist.

In case the notification system fails or I want to process stuff in bulk, I even created a (very) crude admin dashboard:

This was surprisingly easy to implement in Go: go:embed dashboard.html above a var dashboardTemplate []byte and simply leverage the superb built-in text/template parsing support---that also powers Hugo's template system, by the way, so I already knew my way around things like {{ range .Collection}} {{ . }} {{ end }}. The admin part of Go-Jamming requires a simple token as part of the URL which is set in the config file (the miauwkes part in the above screenshot). For the moment, that's more than good enough.

There is one big downside in this system: I'm still confronted by the spam. A message still ends up in my mailbox that requires an action---to approve or reject---if the spammer changes domain to circumvent the blacklist. I hate that, and I was looking into auto-fetching known spam domains like the Pi-Hole does, but that isn't 100% foolproof and already doesn't include entries I now have in my blacklist. I could slowly build up mine and publish that one as part of the project, but Brain Baking isn't that popular so it'll take a while.


"So are you ready to remove that stupid <link rel="pingback" in your header now, Wouter?" To which I reply: no, I am not. I like the fact that my Go-Jamming server can send and receive Pingbacks, and I've had a few genuinely interesting interactions that way. I suppose micro-bloggers that regularly post links to (Wordpress) sites will also appreciate it.

If you are interested in joining the IndieWeb community or enabling Webmentions (and Pingbacks!), but don't want to either be dependent on Webmention.io or implement your own server, please give Go-Jamming a "go" (ha!). It passes all the Webmention Rocks! tests and I've put in a lot of effort to make running it effortless, there's a README and INSTALL instruction on there, and if you're stuck, please let me know!