Comment Spam: new trends, failing counter-measures and why it’s a big deal
recently, new bots rendered current anti spam techniques for blogs almost useless. here is a short write-up on the subject of comment spam, referrer spam and what’s currently happening in that area.
i have given a lot of thought and have done a lot of checking into the subject of comment spam. i came up with a few interesting findings.
if you don’t run a blog (which will make you an expert) or read about this subject in the past, just google it. you are all smart people.
basically though, comment spam is regular spam only posted in blogs and other web pages where comments are possible, both for simple spamming economic purposes as well as to help improve ratings of different sites in google and other search engines. the latter is often done by publicized commercial companies.
i hope by the end of this post to demonstrate how serious blog spam is or at the very least that it deserves some extra attention if you dismissed it in the past.
first off, comment spam is abuse. abuse isn’t new and as soon as a system shows up it will be abused. if not today, than 10 years from now.
it has long been an established yet not widely-known fact that if there are mistakes that can happen, they will happen. leaving a potential problem alive just because no one currently exploits it is terrible, and yet it keeps happening.
if the power grid for a significant part of the us can go down once every several years, so can any other system (if going down is the worst that can happen).
this is only relevant to comment spam in the way it is relevant to every other security related issue, and why is that?
because comment spam indeed isn’t a new thing. anyone remembers how big guest books used to be in the previous century?
and what about referrer spam?
some interesting things noticed about now newly named by me web spam / web content poisoning or cspam (for comment spam):
[making a point about how silly it is to give new names to spam when it skips a medium.. what's your favorite? spit?]
automated spam is spam sent by a bulk-poster (taken from bulk-mailer). it enters web pages and posts spam.
recently we see a serious increase in comment spam activities, namely, in one web page i recently started to help maintain we get over 1000 spam comments a day. i won’t even start discussing the referrer spam poisoning we get.
the spam is no longer sent from just one ip address or even just a few. botnets are indeed blossoming in this field.
apparently, there is a new bot out there which passes these successful defenses. further, anti spam technology in this realm is in no way mature or tried. mostly it is heroic and very impressive efforts done by people because they are annoyed of the spam in their blog.
so far it has been rather successful though, but that success window is running out.
as an example, spammers started posting in a technique which quotes the last paragraph of your text, or starts the post with something relevant and then adds:
“oh, by the way, have you tried viagra?”
in other occasions we see spam posts that would detail how the guy searched the web for law related stuff, but ended up here. btw, if you are also interested in law… check out this page!
my all-time favorites are the posts that say:
“great blog! keep up the good work!”
“i liked what you’ve done here, keep it up!”
etc. entering the spam url as their homepage, which is clickable from their nickname.
recently we have even seen one post that had:
“where do i find the rss feed for this blog?”
sometimes it is very difficult to avoid false positives even with a skilled human doing this full-time.
another type of spam we see, is the manual spam.
people enter the web page with their actual browser and type the spam manually. how much does a skilled illegal alien worker cost per day?
one such spam was recently posted on the site i mentioned (guess which one) in a blog entry about symantec. it talked of symantec and suddenly changed tones and said that their anti spam (of all things), failed them. it suggested using a competitor which worked for them.
when looking at the attacking bots, what we mostly find these days are:
45% open proxies
40% compromised machines
(i haven’t actually calculated the numbers, but that’s roughly right)
misc being anything from a completely open installation of a vnc server to.. your guess is as good as mine.
some examples to captured spam and google-poisoning attempts are abundant, so i won’t bore you. suffice to say every blog gets very specific spam surrounding its topic, as well as the usual peaks in this or that type of spam. lately the house special is pharmacy spam.
referrer spam is still mostly about porn.
looking at gangs, we managed, as an example, to identify a very big eastern european gang (probably one noisy guy or gal), but when they noticed our attention they disappeared for a while.
another important point to make is the domains used. much like with email spam, these change very frequently and seem to be registered in bulk. i don’t doubt these are the same people.
i am now talking with many who are active in this field, and we are establishing a working group/mailing list to address these issues mitigation-wise operationally, as well as research into new trends, bad guys, etc.
some of the already proposed solutions that we are working on are better blacklisting services, combining different types of such poisoning in web applications from comments to referrers and other things i’d rather not discuss right now until they are a bit clearer.
i hope i managed to convince some people of how big this really is. we all heard of blog spam, i and many people around me just didn’t realize the scale until we started working on it.
i figured it’s time to let others know as well.
something can be done about this now to make it less of a threat in coming years. i bet most of us would wait until we have to kill it as a fire, so that it keeps under-going evolution and come back to haunt us.
if i didn’t convince you yet of the risks, there have already been successful worms exploiting such techniques, some examples:
my previous post on blog attacks.
matthew murphy’s post on the xanga worm.
i will update on my (and our) findings on this subject on the securiteam blogs site (http://blogs.securiteam.com/).
this quick & dirty write-up can be found here: