Google as an RBL

For those not familiar with RBL, the term means Real-time Blackhole List, it is mainly used for SPAM fighting. I have recently started playing around with Google as an RBL engine, the idea is that if the search term I use hits too many hits it is likely to be SPAM :)

The danger of course is that the term could be simply popular – but the trick here is that I’m using something very special as the search term – the IP address of the poster.

The IP address shouldn’t be popular; except for a few rare cases, IP addresses listed on Google are directly related to SPAM – either they are listed under wiki-like sites as being banned, or they appear as mass-comment posters. Simply put, if your IP is listed in Google you must be up to no good.

How good is this method? Nothing is bullet proof, but if you have a suspicion of something being SPAM, put the IP in Google and see there are hits; Almost all the comment SPAM I filtered out this month had more than 100 hits in Google, all non-SPAM had either 0 or below the 10 hits mark.

BTW: A good advantage of Google is that it is quick – a few seconds to get a respond – a disadvantage is that you cannot just “hammer” them with searches or they will block you – maybe someone can pickup this idea and make an RBL from IP addresses using Google as a back-engine.

Share
  • http://www.ircarsiv.com Arsiv

    i love you Google.

  • http://www.arrowhead.com.mx Rick

    I think it’s good idea, and the G fellows will be make the adjustments to work ok.

    Everything for SPAM be reduce!

  • Rich Kulawiec

    First, “RBL” refers only to the blacklist run by MAPS — it’s
    not a generic term. The generic terms are “DNSBL” (for those
    which support lookups on IP addresses) and “RHSBL” (for those
    that support lookups on domains, subdomains, and hosts).

    Second, “SPAM” is a trademark of Hormel; “spam” is unsolicited
    bulk email. Moreover,
    given that Hormel has been surprisingly sanguine about our
    co-opting of the term, and that it’s not an acronym, the all-caps
    form should never be used.

    Third, one of the many problems with such a scheme is that
    it’s readily susceptible to gaming by spammers, who would –
    if it were widely used — no doubt game it by causing the IP
    addresses of non-spam sources to appear in sufficient quantity
    as to raise the FP rate to the point where it’d be useless.
    This highlights one of the fundamental problems with any
    number of anti-spam proposals: they presume that the enemy
    won’t adapt to them…when of course we all know that they will
    whenever it becomes worth their time/money to do so.
    A second and equally-thorny problem seen here and in other
    proposals is that they rely on input data which can be controlled
    by the enemy.

    A better approach to this problem would be to mirror that used
    by cooperative DNSBLs: combine observations from trusted
    observers at diverse locations, since any sufficiently-prolific
    spammer will be visible to at least a few of them. Moreover,
    the observation points should not be disclosed, and care should
    be taken to obfuscate the timestamps on the observations.
    Granted, a sufficiently-diligent and patient spammer might be
    able to uncover at least part of the operation of such a setup,
    but not without investing considerable effort.

  • Valentin

    for Rich Kulawiec :

    Rich, I’m with you..
    Initially, I thought it was a good idea to: “somehow, use google to filter the senders”.. but, as you say, they (greedy people..) will kill “the list” if can’t controll-it.

    Rich, is there any solution?