A lot of the sites Plexus builds have a built-in blog component which allows for user comments. As with any front-end form, these can attract a lot of spam, primarily of the "Cheap Rolex Watches Louis Vuitton Handbags" variety where a spam-bot loads up the comment with a ton of spam keywords and links. Over the years at Plexus we have employed several tactics for combatting this: Originally we tried requiring the site administrator to approve all comments before they got shown on the site. This is an effective means of keeping the spam off the site, but it puts a lot of extra responsibility onto the site admin, and the queue of comments waiting to be approved can really pile up if the admin doesn't frequently log in to approve or reject them. This method also takes away most of the "real-time" aspect of commenting, which isn't good.
The next method we employed to some success was the use of CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). You're probably already familiar with CAPTCHA -- that's where you have to look at an image of jumbled letters and numbers and then enter them correctly into a text field in order to prove that you're a human and not a spam-bot. Requiring the user to correctly enter the CAPTCHA code does cut down on spam, but these days it doesn't eliminate it entirely. It seems the spammers are constantly improving their technology to keep up with anti-spam methods, and some of them appear to have cracked CAPTCHA. Thus, even though we had employed CAPTCHA on the Plexus blog, we were still receiving a lot of spam through the comments. That wasn't good, because we were requiring legitimate commenters to jump through an extra hoop, and we weren't even catching 100% of the spam.
Our next step was to look into some of the spam-fighting services available to web developers and to make use of those services on our sites. The first such service I found out about was Akismet (http://akismet.com/), and later I discovered Defensio (http://defensio.com/). They both work in pretty much the same way: When a comment is submitted on your site, you use an API key to make a call to the service, which analyzes the content of the comment along with such information as the submitter's IP address and request headers, then returns its verdict -- either spam or not spam. Based on the verdict from the service, you can then put the comment into an approval queue (not shown on the site) or show it immediately (if the service decides the comment is legit). Defensio even offers filtering for profanity and blocks links to certain categories of websites.
The good news for us is that there are good Ruby wrappers already written for both Defensio and Akismet, which makes integrating them into a Rails app a breeze. I used the Defender Ruby gem to interface with Defensio and the Akismetor plugin to interface with Akismet. API keys for both services are available for free: Akismet offers a free personal key and paid keys for corporate use. Defensio's keys are free for both personal and corporate use while Defensio is in beta -- they will soon announce a pricing structure for corporate API keys.
In order to test the two services head to head, I've put Akismet on the Plexus site and Defensio on another new site we recently rolled out. I'm going to compare the amount of spam each site receives through its comments and see which does a better job of recognizing spam. So far in my tests I have found both to be quite accurate. In the event that the service is mistaken and erroneously marks a legitimate comment as spam, or vice versa, you can correct it. Both services claim to "learn" as they receive more and more of this correctional input from users, so over time the accuracy of the service is supposed to improve.
The thing I like about each of these services is that by integrating them into your own code, you're not necessarily handing the decision over to them. Your code can get the service's verdict and then decide for itself what to do with the comment. For instance, Defensio doesn't simply return a thumbs up or down, it also returns a decimal value between 0 and 1 representing the "spaminess" of a comment, with 0 being a completely legitimate comment and 1 being obvious spam. So you could use that value to handle comments differently depending on their "spaminess" value: For example, allowing everything with a spaminess value up to .4, quarantining everything with a value between .4 and .7 for admin review and automatically discarding everything with a spaminess value above .7. Defensio actually recommends that you not use the spaminess value for anything other than sorting, but you can see the potential for filtering there. I suppose you could even submit the comment to both services, get each's opinion, and then make the call from there, but that would probably be overkill and add unnecessary time to the request.
So if you find yourself constantly weeding through comment spam, look into a spam-fighting service like Defensio or Akismet to keep your meatcloud-sourced content spam and malware-free.
UPDATE (5/25/2010):In the month or so since I installed Akismet on this site, I found it had about a 95% success rate. Not bad, but not perfect. Some spam did still slip though. Today I switched to Defensio, and I will report back how that turns out.










