joestelmach.com — Comment Spam

02 Dec 2006 Comment Spam

I hate comment spam. As if a mailbox full of solicitations to buy cialis soft tabs is not enough, now I have to deal with my blog posts being barraged with comment spam. What's the solution? I obviously can't go on accepting all of these spam posts, so what's a guy to do? Here are some options:

Captchas: You know, the funny looking skewed letters that you have to try and decipher to buy tickets on Ticketmaster. The idea here is that it will block most spam bots from guessing correctly, yielding a high percentage of human-only comments. The downside: they are a pain in the ass. Accessibility issues aside, I just don't like these things - half the time even my human eyes can't decipher what they say.
Ajax-only comments: The idea here is simple: only accept comment requests that ride in on the Ajax bus. Apparently, most comment-spam bots are designed to blast your site with regular old-school requests. For a rails-based web-site (which this is) the implementation would be simple given the availability of the xhr? method on the request object. However, I have to ask: won't someone figure this out and start writing bot programs that makes Ajax requests? What about the more obvious problem of eliminating the user's of non-Ajax browsers from posting comments on your site? Not so much a fan of this solution either.
No anchors: The simplest scheme yet: don't allow anchor tags in your comments. Since the evil spammers are looking to place links on your page, you should be able to take care of the majority of the problem with this approach. The problem, however; is that this goes against the founding principles of the web: The ability to link from one page to another. I think it's great when someone leaves me a link to a site that's relevant to the conversation at hand. It would be a shame to allow the pillars of the web to falter at the mercy of these evil people writing spam bots.
Dictionary: How many times does a comment truly need to contain the word 'viagra' (especially in the context of software development.) Probably never. The dictionary approach uses this concept to weed out human vs. bot comments by not allowing comments to contain certain words. Generating this dictionary to get reasonable results would probably not be that hard, but the dictionary would be in constant need of change as new drugs, gambling games, and cures for baldness come about.

The solution I chose? No anchors. I know - that last bit about pillars and faltering is quite dramatic, but I just have to beleive that this is the best solution for me right now. A regular expression based implementation was put into place and seems to be working well so far. Leave me a comment if you disagree with my position here - just be sure not to leave me a link :)

Comments