Blog comment spam, the scourge of the internet. Having written yet another CMS to power jerakeen.org, I wanted comments on pages again. Django rocks hard – adding commenting was easy. And a day later, I have comment spam. Bugger.
From a purely abstract point of view, I find this interesting. There must be a spider looking for forms that look like things that can take comments. And the robots must be reasonably flexible – it’s not like my CMS is an off-the-shelf. But from a more concrete, ‘spam bad’ point of view, it’s bloody annoying.
So begins my personal battle against spam. Others have fought this battle, but of course the downside of rolling your own site is that you can’t use anything off the shelf. My plan was to forget trying to recognise and filter spam, and preferably I don’t want to have to moderate anything – I don’t want the spam to be submitted at all. And this really can’t be that hard. Unless there’s a human surfing for blogs and typing in the spam themselves, this should really just be a measure of my ability to write a Turing test. Right?
Or perhaps they figure it out once, and use a replay attack to hit every page? Not really a good assumption with hindsight, but never mind. I added prefixes to the form fields that were generated from the current time, and checked at submit time that the fields weren’t more than an hour old. This saves me from having to store state anywhere, and gains me forms that exipre after a while, unless you reverse-engineer the timestamp format. But I’m premising the existence of some automated tool, perhaps with a little human interaction. I don’t need to be perfect, I merely need to be not as bad as everyone else… But no, this failed too.