Defeating Voice Captchas
February 13th, 2006 by SecuriTeam, Filed under: Web, Commentary, Spam
One of the newest (now known though) tricks in the Captcha book is using Voice.
If users cannot understand what the letters are in the now too-complex Captchas that are forced on us due to spammer counter-measures at defeating Captchas, he or she can click on an icon and listen to it.
Here is the earliest example of it that I know of:
http://www.notonebit.com/projects/killbot/kbaudio.php
That example is a bit amateurish, as the recording is bad and obviously not done by a girl with a sexy voice. Still, the disturbance from the bad Microphone can be eliminated or kept entirely. It doesn’t matter.
In this case each letter is played by itself. Further, each letter was recorded only once.
Therefore, how many times does one have to refresh the page and listen to the Captcha to be able to simply learn to identify the Captcha by say, an MD5 hash of the audio for each letter?
Even if it was all set in one audio file, and even if the audio was played with to be, as an example, in a higher pitch. Or perhaps even if several different voices would greet us…
Looking at general similarities in the audio file itself would be enough to break down this Captcha once enough harvesting attempts (not that many really) were saved.
Auto-generated voice? That sounds easy to beat but I am not an audio expert so, “sounds like” will stay as my opinion.
It’s is great to be able to finally understand these new annoying Captchas, but already we are getting to a point where one can’t understand the recorded speech either due to counter-measures from the spammers and the Captchas becoming more and more difficult.
For information on breaking regular text-image Captchas, check:
WhiteAcid’s post
Wikipedia
For my post on new comment spam problems:
http://blogs.securiteam.com/index.php/archives/285
Update from our friend, Valdis Kletnieks who demonstrates use of voice recognition technology, which is not really necessary at this time:
“Given that voice recognition can currently do up to 160 words/minute continuous speech over a 50K word vocabulary at up to 99% accuracy, on commodity hardware, I would think that recognizing 36 letters/numbers would be a no-brainer.”
Gadi Evron,
ge@linuxbox.org.
-
Is your site safe from XSS Attacks? Sign up for Automated Vulnerability Detection Service today!















Subscribe
Let’s see you break it. I’ve had great success with CAPTCHAS and may try this audio one. Looks good, thanks.
[…] So the alternative is to give a version that is useful for the blind, which is an audio version (assuming their text based reader can handle sound files). The audio version reads a series of numbers that they are to transcribe into the box on the page. There are a few problems with this. The first being, you have now made a secondary transmission source for the same access key (we’ll get back to that in a second). The second problem is some businesses would like to store that information that the user went to the audio version for security purposes, or for customization/personalization in the future. Well, hate to throw a wrench into that idea, but that now forces you to be HIPAA (Health Insurance Portability and Accountability Act) compliant (at least in the United States) because you are now storing potentially sensitive medical information about people. Now you are liable under that act if you aren’t taking huge measures to insure compliance. Lovely, huh? […]
What about audio CAPTCHAS like Google’s, where even I can’t understand its meaning?