fighting webmention and pingback spam

This commit is contained in:
Wouter Groeneveld 2022-04-24 20:21:33 +02:00
parent 5ca8c420d7
commit 8baca197ae
2 changed files with 47 additions and 0 deletions

View File

@ -0,0 +1,47 @@
---
title: Fighting Webmention And Pingback Spam
date: 2022-04-24T19:38:00+02:00
tags:
- indieweb
categories:
- webdesign
---
My websites use both the modern Webmention and conventional Pingback systems to communicate with other websites. I've written about [implementing your own Webmention server](/post/2021/05/beyond-webmention-io/) and [my first IndieWeb experience](/post/2021/03/the-indieweb-mixed-bag/) before (hey, that was about a year ago!), so I won't cover the basics here. I still stand by my decision to support Pingbacks, which is _much_ more popular (Wordpress owns half of the internet).
And herein lies the pickle. More popular obviously also means more prone to spam. IndieWeb enthusiasts are quick to dismiss a working concept and invent their own (instead of posting an XML, you're posting a form. Great progress guys!), also "because Pingback comes with a lot of spam". Unfortunately, I'm here to report that Webmention endpoints also receive their share of shit. That's why I spent all weekend redesigning my [go-jamming Jamstack server](https://github.com/wgroeneveld/go-jamming) to better combat spam.
Go-Jamming converts a Pingback to a Webmention, so internally, they're the same and adhere to the same rules. Once the Pingback XML manages to get decoded, that is---most of the spam I see coming my way comes in the form of malformed XML! Hilarious. Here's the log output of a failing one:
```
Apr 18 22:47:28 vps-5f7dac56 go-jamming[1523793]: {"level":"error","error":"pingback POST: Unable to unmarshal XMLRPC <?xml version=\"1.0\" encoding=\"utf-16\" standalone=\"yes\"?>\r\n<methodCall>\r\n\t<methodName>pingback.ping</methodName>\r\n\t<params>\r\n\t\t<param>\r\n\t\t\t<value><string>https://aylesbur-drains dot couk</string></value>\r\n\t\t</param>\r\n\t\t<param>\r\n\t\t\t<value><string>https://brainbaking.com/post/2022/03/an-ad-leaflet-qr-design-mistake/</string></value>\r\n\t\t</param>\r\n\t</params>\r\n</methodCall>: xml: encoding \"utf-16\" declared but Decoder.CharsetReader is nil","time":1650322048,"message":"Pingback receive went wrong"}
```
That British `-drains` domain name comes in many forms with different prepended words, but they're all the same. A `-drains` blacklist entry helps here. I've obfuscated a part of the domain name as I don't want to give spammers the pleasure of linking even if it's implicit
Even if the encoding would be correct, it'll probably fail the following check: if you mention this site on your page, Go-Jamming double-checks that. It parses your HTML (the source) to see if there's an `href` to my site (the target). If not, it simply aborts the process and bins the attempt. For Webmentions specifically, Go-Jamming also checks the form encoding and the presence of the correct form attributes. Also, if a source comes from a blacklisted domain, the fun ends here.
Next, if there isn't any microformat data, and Go-Jamming can't fill in the blanks (Pingbacks usually do not come from IndieWeb-powered sites, so I still have a `parseBodyAsNonIndiewebSite` func implemented), the process again stops. "But what about brute-force attempts?" Good thinking: Go-Jamming comes with _helmet on_---security-related HTTP headers and built-in rate limiting.
Still, some spammers manage to make it all the way through. A new trend I've discovered lately is AI-powered writings: deploy an actual Wordpress site, let the computer generate an article that looks like it was written by someone (e.g. that doesn't merely repeatedly contain "buy now"), secretly paste in the victim's URL, and fire off those Pingbacks. Since the URL is valid and the XML is well-formed, it passes the tests.
So now what? Since my Hugo-powered sites automatically download the latest webmentions, and those were valid, we end up with unwanted comments on our live site. My manual blacklist system wasn't up to the job. Instead, I implemented proper whitelist and blacklist systems not unlike the "in moderation" commenting system in Wordpress:
1. A seemingly OK mention is received. Add it to the moderation queue (a separate database that doesn't pollute the validated ones);
2. A notification (email) is sent out with a link to either approve or reject the mention from the unknown domain;
3. If the mention is rejected, it is deleted forever and the domain is added to the blacklist;
4. If the mention is accepted, it is moved to the validated database and the domain is added to the whitelist.
In case the notification system fails or I want to process stuff in bulk, I even created a (very) crude admin dashboard:
![](../gojamming-admin.jpg "The Go-Jamming admin dashboard with dummy data.")
This was surprisingly easy to implement in Go: `go:embed dashboard.html` above a `var dashboardTemplate []byte` and simply leverage the superb built-in `text/template` parsing support---that also powers Hugo's template system, by the way, so I already knew my way around things like `{{ range .Collection}} {{ . }} {{ end }}`. The admin part of Go-Jamming requires a simple token as part of the URL which is set in the config file (the `miauwkes` part in the above screenshot). For the moment, that's more than good enough.
There is one big downside in this system: I'm still confronted by the spam. A message still ends up in my mailbox that requires an action---to approve or reject---if the spammer changes domain to circumvent the blacklist. I hate that, and I was looking into auto-fetching known spam domains like the Pi-Hole does, but that isn't 100% foolproof and already doesn't include entries I now have in my blacklist. I could slowly build up mine and publish that one as part of the project, but Brain Baking isn't _that_ popular so it'll take a while.
---
"So are you ready to remove that stupid `<link rel="pingback"` in your header now, Wouter?" To which I reply: no, I am not. I like the fact that my Go-Jamming server can send and receive Pingbacks, and I've had a few genuinely interesting interactions that way. I suppose micro-bloggers that regularly post links to (Wordpress) sites will also appreciate it.
If you are interested in joining the IndieWeb community or enabling Webmentions (and Pingbacks!), but don't want to either be dependent on [Webmention.io](https://webmention.io/) or implement your own server, please [give Go-Jamming a "go"](https://github.com/wgroeneveld/go-jamming) (ha!). It passes all the [Webmention Rocks!](https://webmention.rocks/) tests and I've put in a lot of effort to make running it effortless, there's a `README` and `INSTALL` instruction on GitHub, and if you're stuck, please let me know!

Binary file not shown.

After

Width:  |  Height:  |  Size: 141 KiB