Blind Spots in AI Just Might Help Protect Your Privacy

Researchers have found a potential silver lining in so-called adversarial examples, using it to shield sensitive data from snoops.
A computer wears blinders
Illustration: Elena Lacey

Machine learning, for all its benevolent potential to detect cancers and create collision-proof self-driving cars, also threatens to upend our notions of what's visible and hidden. It can, for instance, enable highly accurate facial recognition, see through the pixelation in photos, and even—as Facebook's Cambridge Analytica scandal showed—use public social media data to predict more sensitive traits like someone's political orientation.

Those same machine-learning applications, however, also suffer from a strange sort of blind spot that humans don't—an inherent bug that can make an image classifier mistake a rifle for a helicopter, or make an autonomous vehicle blow through a stop sign. Those misclassifications, known as adversarial examples, have long been seen as a nagging weakness in machine-learning models. Just a few small tweaks to an image or a few additions of decoy data to a database can fool a system into coming to entirely wrong conclusions.

Now privacy-focused researchers, including teams at the Rochester Institute of Technology and Duke University, are exploring whether that Achilles' heel could also protect your information. "Attackers are increasingly using machine learning to compromise user privacy," says Neil Gong, a Duke computer science professor. "Attackers share in the power of machine learning and also its vulnerabilities. We can turn this vulnerability, adversarial examples, into a weapon to defend our privacy."

A Dash of Fake Likes

Gong points to Facebook's Cambridge Analytica incident as exactly the sort of privacy invasion he hopes to prevent: The data science firm paid thousands of Facebook users a few dollars each for answers to political and personal questions and then linked those answers with their public Facebook data to create a set of "training data." When the firm then trained a machine-learning engine with that dataset, the resulting model could purportedly predict private political persuasions based only on public Facebook data.

Gong and his fellow Duke researcher Jinyuan Jia wondered if adversarial examples could have prevented that breach of privacy. If changing only a few pixels in a photo can trick a machine-learning-trained image recognition engine into confusing a rabbit and a turtle, could adding or subtracting a few Facebook likes from someone's profile similarly exploit machine learning's weaknesses?

To test that hypothesis, the Duke researchers used an analogous data set: reviews in the Google Play store. To mirror Cambridge Analytica, they collected thousands of ratings in Google's app store submitted by users who had also revealed their location on a Google Plus profile. They then trained a machine-learning engine with that data set to try to predict the home city of users based only on their app ratings. They found that based on Google Play likes alone, some machine-learning techniques could guess a user's city on the first try with up to 44 percent accuracy.

Once they'd built their machine-learning engine, the researchers tried to break it with adversarial examples. After tweaking the data a few different ways, they found that by adding just three fake app ratings, chosen to statistically point to an incorrect city—or taking revealing ratings away—that small amount of noise could reduce the accuracy of their engine's prediction back to no better than a random guess. They called the resulting system "Attriguard" in a reference to protecting the data's private attributes against machine-learning snoops. "With just a few changes, we could perturb a user’s profile so that an attacker’s accuracy is reduced back to that baseline," Gong says.

The cat-and-mouse game of predicting and protecting private user data, Gong admits, doesn't end there. If the machine-learning "attacker" is aware that adversarial examples may be protecting a data set from analysis, he or she can use what's known as "adversarial training"—generating their own adversarial examples to include in a training data set so that the resulting machine-learning engine is far harder to fool. But the defender can respond by adding yet more adversarial examples to foil that more robust machine-learning engine, resulting in an endless tit-for-tat. "Even if the attacker uses so-called robust machine learning, we can still adjust our adversarial examples to evade those methods," says Gong. "We can always find adversarial examples that defeat them."

To Wiretap a Mockingbird

Another research group has experimented with a form of adversarial example data protection that's intended to cut short that cat-and-mouse game. Researchers at the Rochester Institute of Technology and the University of Texas at Arlington looked at how adversarial examples could prevent a potential privacy leak in tools like VPNs and the anonymity software Tor, designed to hide the source and destination of online traffic. Attackers who can gain access to encrypted web browsing data in transit can in some cases use machine learning to spot patterns in the scrambled traffic that allows a snoop to predict which website—or even which specific page—a person is visiting. In their tests, the researchers found that the technique, known as web fingerprinting, could identify a website among a collection of 95 possibilities with up to 98 percent accuracy.

The researchers guessed that they could add adversarial example "noise" to that encrypted web traffic to foil web fingerprinting. But they went further, attempting to short-circuit an adversary circumvention of those protections with adversarial training. To do so, they generated a complex mix of adversarial example tweaks to a Tor web session—a collection of changes to the traffic designed not merely to trick the fingerprinting engine into falsely detecting one site's traffic as another's, but instead blending adversarial example changes from a broad collection of decoy sites' traffic.

The resulting system, which the researchers call "Mockingbird" in a reference to its blended mimicking strategy, adds significant overhead—about 56 percent more bandwidth than normal Tor traffic. But it makes fingerprinting far more difficult: The accuracy of their machine-learning model predictions of what website a user was visiting dropped to between 27 percent and 57 percent. And because of the randomized way they tweaked the data, that protection would be tough for an attacker to overcome with adversarial training, says Matthew Wright, one of the RIT researchers. "Because we’re jumping around in this random way, it would be really hard for an attacker to come up with all the different possibilities and enough of his own adversarial examples that cover all of them," says Wright.

These early experiments in using adversarial examples as a protective mechanism rather than a flaw are promising from a privacy standpoint, says Brendan Dolan-Gavitt, a computer scientist at NYU's Tandon School of Engineering who focuses on machine learning and security. But he warns that they're fighting the tide of machine-learning research: The vast majority of academics working on machine learning see adversarial examples as a problem to solve, rather than a mechanism to exploit.

Sooner or later, Dolan-Gavitt says, they may solve it and remove adversarial examples as a privacy feature in the process. "It’s definitely viable for the state of the art, given what we know right now," says Dolan Gavitt. "I think my main concern is that protecting against adversarial examples and training machine-learning models that won’t be vulnerable to them is one of the hottest topics in machine learning right now. The authors are betting this is a fundamental problems that can’t be overcome. I don’t know if that’s the right bet."

After all, Dolan-Gavitt points out, it's desirable for machine learning to work when it's detecting tumors or guiding cars. But with every advance that increases machine learning's powers of divination, it also becomes that much harder to hide from it.


More Great WIRED Stories