Comment Spam: new trends, failing counter-measures and why it’s a big deal

recently, new bots rendered current anti spam techniques for blogs almost useless. here is a short write-up on the subject of comment spam, referrer spam and what’s currently happening in that area.

i have given a lot of thought and have done a lot of checking into the subject of comment spam. i came up with a few interesting findings.

if you don’t run a blog (which will make you an expert) or read about this subject in the past, just google it. you are all smart people. :)

basically though, comment spam is regular spam only posted in blogs and other web pages where comments are possible, both for simple spamming economic purposes as well as to help improve ratings of different sites in google and other search engines. the latter is often done by publicized commercial companies.

i hope by the end of this post to demonstrate how serious blog spam is or at the very least that it deserves some extra attention if you dismissed it in the past.

first off, comment spam is abuse. abuse isn’t new and as soon as a system shows up it will be abused. if not today, than 10 years from now.

it has long been an established yet not widely-known fact that if there are mistakes that can happen, they will happen. leaving a potential problem alive just because no one currently exploits it is terrible, and yet it keeps happening.
if the power grid for a significant part of the us can go down once every several years, so can any other system (if going down is the worst that can happen).

this is only relevant to comment spam in the way it is relevant to every other security related issue, and why is that?
because comment spam indeed isn’t a new thing. anyone remembers how big guest books used to be in the previous century? :)

and what about referrer spam?

some interesting things noticed about now newly named by me web spam / web content poisoning or cspam (for comment spam):

[making a point about how silly it is to give new names to spam when it skips a medium.. what's your favorite? spit?]

automated spam is spam sent by a bulk-poster (taken from bulk-mailer). it enters web pages and posts spam.

recently we see a serious increase in comment spam activities, namely, in one web page i recently started to help maintain we get over 1000 spam comments a day. i won’t even start discussing the referrer spam poisoning we get.

the spam is no longer sent from just one ip address or even just a few. botnets are indeed blossoming in this field.

recently, there has been a serious increase in spam, coupled with the fact that it passes current spam detection techniques (such as black-listing for ip addresses and spammed domains, javascript captchas, number of url’s in comment, key works – useless anyway, some user captchas, etc.).

apparently, there is a new bot out there which passes these successful defenses. further, anti spam technology in this realm is in no way mature or tried. mostly it is heroic and very impressive efforts done by people because they are annoyed of the spam in their blog.
so far it has been rather successful though, but that success window is running out.

as an example, spammers started posting in a technique which quotes the last paragraph of your text, or starts the post with something relevant and then adds:
“oh, by the way, have you tried viagra?”

in other occasions we see spam posts that would detail how the guy searched the web for law related stuff, but ended up here. btw, if you are also interested in law… check out this page!

my all-time favorites are the posts that say:
“great blog! keep up the good work!”
“i liked what you’ve done here, keep it up!”

etc. entering the spam url as their homepage, which is clickable from their nickname.

recently we have even seen one post that had:
“where do i find the rss feed for this blog?”

sometimes it is very difficult to avoid false positives even with a skilled human doing this full-time.

another type of spam we see, is the manual spam.
people enter the web page with their actual browser and type the spam manually. how much does a skilled illegal alien worker cost per day?

one such spam was recently posted on the site i mentioned (guess which one) in a blog entry about symantec. it talked of symantec and suddenly changed tones and said that their anti spam (of all things), failed them. it suggested using a competitor which worked for them.

when looking at the attacking bots, what we mostly find these days are:
45% open proxies
40% compromised machines
10% misc
5% unknown

(i haven’t actually calculated the numbers, but that’s roughly right)

misc being anything from a completely open installation of a vnc server to.. your guess is as good as mine.

some examples to captured spam and google-poisoning attempts are abundant, so i won’t bore you. suffice to say every blog gets very specific spam surrounding its topic, as well as the usual peaks in this or that type of spam. lately the house special is pharmacy spam.

referrer spam is still mostly about porn.

looking at gangs, we managed, as an example, to identify a very big eastern european gang (probably one noisy guy or gal), but when they noticed our attention they disappeared for a while.

another important point to make is the domains used. much like with email spam, these change very frequently and seem to be registered in bulk. i don’t doubt these are the same people.

i am now talking with many who are active in this field, and we are establishing a working group/mailing list to address these issues mitigation-wise operationally, as well as research into new trends, bad guys, etc.

some of the already proposed solutions that we are working on are better blacklisting services, combining different types of such poisoning in web applications from comments to referrers and other things i’d rather not discuss right now until they are a bit clearer.

i hope i managed to convince some people of how big this really is. we all heard of blog spam, i and many people around me just didn’t realize the scale until we started working on it.

i figured it’s time to let others know as well.

something can be done about this now to make it less of a threat in coming years. i bet most of us would wait until we have to kill it as a fire, so that it keeps under-going evolution and come back to haunt us.

if i didn’t convince you yet of the risks, there have already been successful worms exploiting such techniques, some examples:
my previous post on blog attacks.
matthew murphy’s post on the xanga worm.

i will update on my (and our) findings on this subject on the securiteam blogs site (http://blogs.securiteam.com/).

this quick & dirty write-up can be found here:
http://blogs.securiteam.com/index.php/archives/285.

gadi evron,
ge@beyondsecurity.com.

Share
  • hmm

    consider it this way; blogs are typically a way for a company to sneakily ‘spam’ readers with their company (*glares at banner on the left*), so consider it payback?

    i don’t see comment spam as a problem. it’s relatively easy to solve; timeout per ips, validating comments if required, captchas if you like, whatever.

    what’s the big deal?

  • sunshine

    This is the blogs site for securiteam, so… *staring at banner on the left*

    Try running a blog and handling the comments using your methods. :)

    Captchas may do some of the work, but:
    1. They are annoying.
    2. People rather not post at all than use them.
    3. If they are not to be broken, they are in many occasions difficult enough for even a human to understand.

    Payback? Spam? Are you ok? :)

  • hmm

    Okay okay.

    Here’s a tip:

    Make your blog boring (hardly difficult these days). This makes you less of a target …

    But seriously, I agree on captcha’s; not really a great solution for this problem. I don’t see how a bit of “comment moderation” is really bad or annoying. I ran a blog once and just moderated away the annoying posts. (Just like running a mailing list …)

    The real issue, anyway, is that sending bandwidth to you is pretty much free … so you either have to accept the spam that may or may not come, or just moderate!

    I still don’t see the big deal.

  • http://www.shaolintiger.com ShaolinTiger

    I run a few fairly high traffic blogs, and I have to say hmm, your solution is not practical at all.

    At one of the blogs, during a particularly bad period I could experience 12,000 spam comments in a day, and perhaps 30-50 real ones..

    I’d like to see you wade through that and moderate it :)

  • hmm

    How does it compare to moderating a mailing list? Why is it any different then that?

    Anyway, I would suggest that I can glance at 50 “posts” and decide pretty quickly which are spam and which are not. 12k/50 is still alot, but not
    more then a few minutes work …

    Even still, what about the simple solution of posts-per-ip-per-second. ? Or even just simply a delayed script processing time.

    data = ...;
    sleep(100);
    process(data);

    Then some start “lockdown” / “stop comment processing” system.

    Still don’t see it ..

  • sunshine

    Wanna work for me? If you can do it in just a few short minutes, you’re either superman or not very attached to reality.

  • http://www.whiteacid.org WhiteAcid

    You could restrict commenting to level0 users and open registration. It’s not a neat method as it’ll dissuade most one time commenters but it would probably work quite well.

    You could also allow anonymous commenting but require that to be moderated.

    Have you tried viagra yet?

  • sunshine

    Spam bots these days register. :)

    Viagra? I heard there is this new thing called.. err.. let me check my spam folder.

  • Steve

    I too have seen 50,000+ spam posts a day coming from about 10,000 different IPs numbers. Requiring registration doesn’t help, as the bots go through registration before posting spam. :(

  • http://www.whiteacid.org WhiteAcid

    This might be stupid but what about printing a random number into a hidden element on the comments form and checking that the random number is correct server side. That way when they submit straight to the target of the form they won’t have the right hidden number.

    If they do infact read hidden fields and post them too. Then print the random number somewhere else in the source code and use JavaScript to copy that over to the hidden field automatically. This obviously won’t work for people who don’t use JS but that’s hardly anyone.

    That should eliminate anyone not using a browser.

  • sunshine

    There are Javascript Captcha which demand you have a Javascript enabled browser to do the crazy calculation.
    Lucky enough, there are readt-made libraries to implement Javascript into bots.

  • Pingback: spam huntress » blog archive » Sun Shine on webspam

  • sunshine

    For my post on breaking voice Captchas:
    http://blogs.securiteam.com/index.php/archives/287

    For WhiteAcid’s post on breaking Captchas:
    http://blogs.securiteam.com/index.php/archives/208

  • hmm

    > Wanna work for me? If you can do it
    > in just a few short minutes, you’re
    > either superman or not very attached
    > to reality.

    Not a bad place to be, really, is it?

    I’m not saying it’s a minor problem, but it certainly isn’t “a big deal”. At least IMHO. Especially consider most of the time it’s the pot calling the kettle black.

    If you bloggers get _really_ desperate just start moderating the posts! I’m sure you’ll find the spam slows down …

  • Pingback: SecuriTeam Blogs » Defeating Voice Captchas

  • http://rathamahata.blogspot.com/ seo black & white

    I think that currently too little attention payed to OpenID, if it will be correctly introduced all over the places it will shutdown effectiveness of all not authentic old doorways to a lower level for some time. If you will look at the current spam situation from the linkspammer (e.g. me) angle of view you will found that many freeby services currently used either as beneficiary or (after a short period in beneficiary only mode or even without it) as a targets of spammy backlinks. That would not be possible if freeby service owner will kinda secure all backlinks pointing to it by allowing only list of domains he trusts. Though it is not a good solution in long term. Maybe OpenID should be eveloved to something with delayed confirmation e.g “yes I am allow you to create at particular site backlinks pointing to my site”…

  • Pingback: SecuriTeam Blogs » drive-by sites, blog spam/domains and spyware - analysis, examples and facts

  • t.barnett

    i can glance at 50 “posts” and decide pretty quickly which are spam and which are not. 12k/50 is still alot, but not
    more then a few minutes work …

    sounds like you are underestimating yourself.

    i am now talking with many who are active in this field, and we are establishing a working group/mailing list to address these issues mitigation-wise operationally, as well as research into new trends, bad guys, etc.

    and to join?

  • thomas barnett

    Ironic!

    “Sorry, but your comment has been flagged by the spam filter running on this blog:…”

    You should consider telling us what to comment so we can pass the spam filter.

    Maybe you are trying too hard to deal with something that does not worth dealing and in the process you are annoying your reader. I can, at least, choose to ignore spam!

  • sunshine

    I just made a point about why spam of this kind is not only bad and annoying, but illegal:
    http://blogs.securiteam.com/index.php/archives/290

  • sunshine

    The spam filters save us a bit of work, so why not use them? One false positive a week I can definitely work with. See, I approved both your posts. :)

    You can contact me via email for details on what you asked.

  • http://golem.ph.utexas.edu/~distler/blog/ Jacques Distler

    I’m sorry, but I’m trying to figure out what’s new here.

    Comment spambots have been using open proxies for a year and a half (at least). Using botnets seems to be a more recent innovation. Maybe 6 months old, or so.

    But are the ‘bots themselves any more sophisticated than they used to be? My limited experience seems to indicate not.

    What are you seeing in the current generation of comment spambots that the previous ones didn’t do?

  • sunshine

    simply put, they pass current spam protection a lot more often than they used to.

    In the case of Spam Karma I am not sure if it’s really a new bot as much as just spammers putting URL’s in the URL field and not in the post DATA body, but certainly there is a very noticeable increase, and certainly they pass spam protection far too often now.

  • http://golem.ph.utexas.edu/~distler/blog/ Jacques Distler

    Well, you’d expect the effectiveness of a system like Spam Karma to decrease over time, not through any technological breakthrough on the part of the spammers, but due to a gradual shift in their tactics.

    I don’t care about their tactics. What interests me is the state of their technology.

    Are they crafting general-purpose spambots, or ones specialized to particular weblogging systems? Does the spambot merely attempt to POST to the comment script, or does it parse the comment submission <form>? If the latter, what algorithm does it use, when confronted with a form with multiple <input type=”submit”> elements?

    And so forth…

    In your writeup, you made it sound as if the current crop of comment spambots is more sophisticated than previous ones. If so, then I am very interested. “Slips past Spam Karma,” however, is not evidence of much of anything.

  • sunshine

    I quite agree on the Spam Karma evasion point you make. I’d add though as to effectiveness that any open system for security such as anti spam or anti virus will always face an inherent problem: whatever they choose to do in their engine is seen by the bad guys.
    We can create a perfect system, but it will be too slow or have too many false positives (not so perfect), point being that any such program will make trade-off decisions on what should stay and what should go. This slows us down too much or that adds far better detection… so yep. Spam Karma, as great as it is, is no good example. Especially since I believe some bots passed it by putting the the spam URL in the URL field instead of in the message DATA body.

    As to all the other information you seek… we should talk via email and I will share what I have, although time constraints prevent me from investing the necessary time to provide with all of what you ask.
    There is more information here: http://blogs.securiteam.com/index.php/archives/290

    :)

  • http://golem.ph.utexas.edu/~distler/blog/ Jacques Distler

    I don’t want to take up too much of your valuable time, but — at your convenience — I’d appreciate if you would email me whatever forensics you might have on the current crop of spambots.

  • Pingback: SecuriTeam Blogs » Comment spam and Xanga: create blogs to spam to?

  • http://tabletki.org/free-verizon-ringtone/ Tim

    I like your site alot, keep up job!

  • Pingback: SecuriTeam Blogs » Comment spam? What’s that?

  • Pingback: Nexus Blogs » Blogs - SecuriTeam Blogs

  • Pingback: SecuriTeam Blogs » Advanced targeted comment spam and FP decision making

  • Pingback: xslf.com