unCAPTCHA AI Cracks Google reCAPTCHAs with 90% Accuracy

A proof-of-concept from the University of Maryland can defeat the audio challenges that are offered as an option for people with disabilities.

unCAPTCHA, an artificial intelligence-based automated system designed at the University of Maryland, has been updated to break Google’s latest audio-based reCAPTCHA challenges with an accuracy rate of 90 percent.

Google has been working on refining and strengthening reCAPTCHA for years, a Turing test-based methodology for proving that website users aren’t robots. It typically challenges users that it thinks might be bots by asking them to read distorted text and type it into a box, or to select groups of pictures that have something in common. But audio challenges are offered as an option for people with disabilities; these consist of sequences of recorded voices. Users are simply asked to type in what they hear.

Last year, the University of Maryland team cracked the audio mechanism with unCAPTCHA, which combines free, public, online speech-to-text engines, including Google’s own, with a phonetic mapping technique. The system downloads the audio challenge, breaks it into several digital audio clips, then runs them through several text-to-speech systems to determine exact and near-homophones, weights the aggregated results by confidence level, and then sends the most probable answer back to Google.

At the time, reCAPTCHA used digits: The recorded voice would say a series of numbers that the user would then need to type in. The results of the trial showed that the AI could solve 450 reCAPTCHA challenges with an 85.15 percent accuracy in 5.42 seconds: That’s less time than it takes to listen to the challenge in the first place.

But Google earlier this year updated reCAPTCHA, changing the audio challenges so that they use words rather than numbers, and improving browser-based automation detection. The team in response updated unCAPTCHA to be able to crack this too. It uses the same text-to-speech approach, and unCAPTCHA2 also uses a screen clicker to move to certain pixels on the screen and move around the page like a human.

The system now has a 90 percent accuracy rate, researchers said.

The update was made in June, but the researchers waited until late December before going public.  According to the unCAPTCHA GitHub repository, Google has deemed the weakness to be out-of-scope for its bug-bounty program, and it’s unclear if it plans to address the attack.

“We have been in contact with the ReCAPTCHA team for over six months and they are fully aware of this attack,” the researchers wrote. “The team has allowed us to release the code, despite its current success.”

Google did not immediately respond to a request for comment from Threatpost.

The researchers noted that the proof-of-concept shows that bad actors don’t need significant resources to mount a large-scale successful attack on the reCaptcha system.

“Prior work has generally assumed that attackers against CAPTCHA systems are well-resourced,” the researchers said in their original paper. “In particular, the standard threat model involves an attacker who can attack the CAPTCHA tens or hundreds of thousands of times for a relatively small number of successes, and can scale this attack to abuse services.”

They added, “An attacker with many resources can afford a lower success rate, and thus some have argued that even a success rate of 1/10,000 is sufficient to threaten the integrity of services. In our work, we will assume an attacker with limited resources; unlike previous works attacking captchas, our threat model limits the attacker to one computer, one IP address, a small amount of RAM and limited training data (less than 100MB). Therefore, we aim for accuracy benchmarks above 50%, as a low-resource attacker cannot afford a lower percentage of success.”

Suggested articles