Rel=”nofollow” Follow Up

The rel=”nofollow” is certainly coming into effect already, with quite a few prominent weblogs implementing it themselves, installing a patch/update or a plugin.

In my previous comment about it, I mentioned that I felt that it isn’t the search engines job to filter out spam and that it should rest on the owner of the site to make sure thier particular backyard on the internet is mowed.

With that in mind, we clearly need to come up with some alternative methods to combat spam. There are a few options which would invariably slow down most spammers, but not all, lets investigate a few of them.

The first being mandatory registration on your site to leave a comment. The problem with forced registration, is that it doesn’t lend itself to someone being linked to your site and leaving a comment. Signing up on every site is just a pain in the arse, you know it and so do I, so for the moment, that isn’t an option.

Secondly, I think forcing comment moderation is an option. However, if you have an active blog, the inherent workload for the owner is quite tall. There is also the downside that people leaving comments on your site can’t view them, or participate with other users, until you approve their comments. Not ideal, we’ll leave it for the moment.

Third, this isn’t all that likely. Allow anyone to post comments to your site and their comments go live, but be examined for spam content before posting. This is fine, except where the spammers leave a non-spam like comment with a link. At which point, it gets posted and they get their reward. We could take it further and parse their input, pull down the text for the page they are linking to and parse the html for illegal keywords (in the same line of thinking as Squid might if it was proxying content).

Fourth and this is really a category of tactics. User interogation when they post. For instance, they go to post and before they do, they have to enter a string that is blurred within an image (done before). What about a random but easily answered question? This line of thinking I think, would make it much harder for spammers to automate their attacks; especially if the challenge was random.

Fifth, change the way we accept comments. For instance, most spammers will pick a particular type of blogging software and attack it because it is simple. Look at MT, when you submit a comment with that software, the feedback is always posted to comments.cgi or the like of. If I were a spammer, that is making my life very simple. Make it more complex, lets make the submission URL synthetic, so they can’t hardcode it. Lets link the synthetic URL to their session id and make it available for only x minutes at a time. Check that the referrer for the submission is in fact your own site and that the HTTP header information is all there and intact.

At this point, I havn’t thought the fifth item right through; however I feel that there might actually be some merit in it. What about a combination of all of them, varying from submission to submission; just to keep them guessing a little.

What ideas have crossed your mind about it?

3 thoughts on “Rel=”nofollow” Follow Up

  1. I think fighting spam is all about maximising spammers’ effort thus minimising their profit. Hopefully by making spamming not cost effective, it’ll gradually die down. For example, eliminating automation of spamming by require some kind of human intervention (enter in text in gif, confirmation email, random sanity-checking questions, etc). Slow down the rate of spamming (compulsory delay in the script, throttling, etc). Content filtering (spammy words filtering, RBL-style central database, etc).

    Or alternatively, maybe we should just have a law that heavily penalise the spammers :)

  2. That is a pretty similar stand point to what I take on the matter as well Scott. I honestly believe that if it wasn’t either cost effective for spammers, that would initially kill off a significant volume of them. If the huge rewards for people to spam was removed as well, then clearly it would die off as there wouldn’t be a good reason for them to continue.

    I suppose the reality of it is, as long as money can be made through the internet, then there will always be some reason for them to spam. Our job will be to continually make it harder and harder for them to do it efficiently.

    I’ve been pondering writing a couple of different plugins for WordPress to test some of these thoughts out. The one I’m quite interested in, is pulling down the linked site and scanning it for keywords. Chances has it, that if it is a spammer the site will be related to porn, gambling or drugs of some sort. If you get x number of positive hits for it, you either don’t display the post at all, don’t link the URL or link the URL with rel=”nofollow” (I’d prefer to not link it period to be honest).

    Have you considered any new sorts of anti-spam type mechanisms at all?

  3. The only problem with downloading the linked site is that it can affect page views, particularly important with regards to ads on that page. A page with ads will register page views and then money will be exchanged between the advertiser and host site. So if I was a spammer, I would spam my site everywhere, then all those blogs will download my site to verify its non-spammyness, and my ad hits will skyrocket, making me money from the advertisers.

    Softcoding the comment.cgi URL and using a sessionID as well in the URL is probably the best way mentioned. It’s a little more resource intensive because it then requires more database hits to compare who’s using the URL, but it’s a small prize. Even more than sessionID, I would use an ID that’s a derivative of the session AND the instance of hitting “Leave a comment” with some expiration time.

    Plus it is safer because you’re in 100% control of everthing in the process, avoiding any unintentional consequences from using 3rd party content (foreign page).

    heh. I just noticed this post is a year old :)

Comments are closed.