Revenge of the Captcha! (Reverse Captcha, Ransom Notes and Image Spam)

thanks for the image to jeff chan. click on it for full size.

for months now, images have been increasingly seen in spam, reaching up to 30 to 40 per cent of all spam total. for a while, counter-measures have been in play, developed by many different folks, some we know, some we don’t. from system administrators developing signatures to a team at spamassasin working on an ocr system to break these images and check their text for spamishness.

when first encountered, a friend of mine was as excited as me: “why, it’s exactly like a captcha, only in reverse!”

hence the term i just coined – reverse captcha.

as it’s a cat and mouse game of escalations and counter-measured by bad guys and good guys, the bad guys learn and make our lives more difficult. i will try to explain what a reverse captcha is to me (and no, it’s not a special type of turing test, although we touch on that below).

captchas are the annoying images (or audio?) you are asked to observe and later on, type in characters that match what you see or hear, to verify you are human rather than a bot. there are more advanced types of captchas, but you get my drift. captcha are often described as a “reverse turing test”.

i suppose these images in spam (actually, the spammers) liked the idea, and as most not black/gray listing related spam filters are content-specific, the use of images to deliver the spammy message makes sense. a computer can’t read it.

this write-up, coming up with the same term “reverse captcha” not very long ago, describes potential tests computers can deal with, but humans can’t:

this is an interesting variation on the turing test, in which humans generate and grade tests that most humans can pass, but current computer programs cannot pass. is there another variation in the future, in which computers generate and grade tests that computers can pass, but humans cannot pass?

in our case however (referencing “reverse captcha”), the captcha is the same as the ones used on web sites to prevent spam bots from posting. it tests for a human being a human, only that the computer avoided is not a remote attacker (multiple attackers), but a local centralized filter. the main difference other than avoiding a guard rather than multiple hordes of bots is that the captcha now helps the bad guys. it’s used by the other side.

naturally, the spam filters stated utilizing ocr much like the spammers using bots did when trying to bypass our protections. so, our lovely friends the spammers started obfuscating their spam messages, even creating images looking like ransom notes in the attempt to get through these filters.

indeed, it is not the technology which is evil, it is its uses. i find this battle extremely interesting, and participate as well as observe as much as i can.

nowadays ascii art spam is seen more and more often.. new tricks are invented daily, as well as new battle fields. the idea for “revernge of the cpatcha” was taken from coderman on the funsec list.

gadi evron,

  • Lev

    I love the ASCII art ones :)

    ah  us    ty    mm      al  td  ep  vh    ag 
    su  ga   zeee   ok      yk  ch  eq  jrg  ymp 
    fd  vj  tg  yc  jv      yo  vp  km  zgdadmrv 
    dh  kr  lclyea  fp      wv  ja  au  ln bh bn 
     gmdu   nw  vs  he  kb  ay  nq  ak  fa    ko 
      lt    er  pu  kqvyfs  pz   ovsc   dn    oj 
    ri  lh  yp    jc     fvzw   zogle     kq     
    an  tq  wj   egpk   kz  se  fv  vp   hblm    
    cr  ju  ls  zr  ji  uj      jj  ko  nc  ji   
    wd  rf  sw  cwwrqj  ui ecx  mvhue   lexkdx   
     evnu   fz  ha  ei  gc  ef  og  py  lc  dk   
      mf    fl  qd  bp   puom   ot  vk  si  po   
     fypp   xi    ze    ki      qz   lwgy        
    in  pt  xq   gevp   bq      ke  at  zj       
    hp      ym  mh  pq  pe      ur  kik          
    tc      lj  ptecon  lt      iq     iui       
    nr  qq  nm  it  lo  cw  bt  bo  ly  qt       
     rlpn   is  ii  pk  nteynd  fa   wpti        
    rv   lg    pa    dn  gt    jj    hu   wu     
     vf ju    gfmp   xlj bh   bifl    fp ly      
      bfr    wq  ij  xgnbzk  xo  kp    aqx       
      tiu    ffrdpe  btzppp  cbisjv    lus       
     os pa   fq  re  vq zdh  zq  lq   au ac      
    ov   db  oh  el  od  nj  qa  qs  ll   mb     

  • Phil

    The latest stock pump and dump ones are much prettier. They’ve moved to multicoloured backgrounds and light coloured text, which “wobbles” all over the place. A free work of art with every spam!

    But, the latest FuzzyOCR V3.4.2 plugin for spamassassin catches them with a simple additional scanset:

    ocrad -s 5 -i

    does the trick nicely, at least on the ones I tried.

  • Deapesh Misra

    Thus then, is a ‘Reverse CAPTCHA’ a ‘reverse-reverse-Turing test’ == ‘Turing test’?

    Turing Test -> (reversed) –> CAPTCHA
    CAPTCHA -> (reversed) –> Turing Tests

    But as you mention, it is only the intent (CAPTCHA images with good intent becoming spam images with bad intent) which is reversed in image based CAPTCHA.

    Thus calling ‘image based spam’ as ‘reverse CAPTCHA’ will muddle up the definition. This new nomenclature won’t clearly mean the reversal in intent.

  • sunshine

    It is reverse in the sense that it is used, indeed, for bad rather than good, but more than that:
    1. It “attacks” the server rather than the clients.
    2. It is used to trick a pull mechanism rather than a push one.

    Perhaps we need a new word rather than “Reverse”, although I like it. See, a Captcha may be a reverse Turing test of sorts, but it is used currently in the wild. A Turing test for machines, well… that’s defined already?

  • Dennis

    After looking at the current spam picture, I decided to implement grey-listing at my company. The effect was awesome. The trade for a couple days of delayed email was a 80% reduction in the spam overall spam level and a complete defeat of the image spam as it is delivered by bots that spew and move on. The next step was to deploy OCR technology, but grey-listing has me wondering if it is necessary for the next few months.

  • Brad Knowles

    We did greylisting a while back on two sites I help manage. It helped for a while, but spammers have long since learned how to easily get around it — just keep track of which attempts resulted in temporary failures and retry at a later time.

    No real queueing is involved, and the additional work by the botnets is trivial.

    We’re right back to where we were. In fact, now we’re worse off, because the spammers have built even larger botnets that are harder for providers to detect and eliminate, which means that they can send out 10-100 times as much spam as they could just a few months ago.

    IMO, the key difference here is that you don’t actually have to read the CAPTCHA to recognize that one is being used, and to use this as a strong indication that the message in question is actually spam.

    It’s one thing to recognize that something is present, it’s quite another to be forced to actually recognize and understand precisely what it is. A simple light sensor can tell you whether it’s light or dark outside. But it takes a much more sophisticated device to tell you if the light is coming from the Sun or from a bright flashlight.