Machines’ ability to learn by processing data gleaned from sensors underlies automated vehicles, medical devices and a host of other emerging technologies. But that learning ability leaves systems vulnerable to hackers in unexpected ways, researchers at Princeton University have found.
In a series of recent papers, a research team has explored how adversarial tactics applied to artificial intelligence (AI) could, for instance, trick a traffic-efficiency system into causing gridlock or manipulate a health-related AI application to reveal patients’ private medical history. As an example of one such attack, the team altered a driving robot’s perception of a road sign from a speed limit to a “Stop” sign, which could cause the vehicle to dangerously slam the brakes at highway speeds; in other examples, they altered Stop signs to be perceived as a variety of other traffic instructions.
“If machine learning is the software of the future, we’re at a very basic starting point for securing it,” said Prateek Mittal, the lead researcher and an associate professor in the Department of Electrical Engineering at Princeton. “For machine learning technologies to achieve their full potential, we have to understand how machine learning works in the presence of adversaries. That’s where we have a grand challenge.”
Just as software is prone to being hacked and infected by computer viruses, or its users targeted by scammers through phishing and other security-breaching ploys, AI-powered applications have their own vulnerabilities. Yet the deployment of adequate safeguards has lagged. So far, most machine learning development has occurred in benign, closed environments – a radically different setting than out in the real world.
Mittal is a pioneer in understanding an emerging vulnerability known as adversarial machine learning. In essence, this type of attack causes AI systems to produce unintended, possibly dangerous outcomes by corrupting the learning process. In their recent series of papers, Mittal’s group described and demonstrated three broad types of adversarial machine learning attacks.
Play Video: Stop sign
Poisoning the data well
The first attack involves a malevolent agent inserting bogus information into the stream of data that an AI system is using to learn – an approach known as data poisoning. One common example is a large number of users’ phones reporting on traffic conditions. Such crowdsourced data can be used to train an AI system to develop models for better collective routing of autonomous cars, cutting down on congestion and wasted fuel.
“An adversary can simply inject false data in the communication between the phone and entities like Apple and Google, and now their models could potentially be compromised,” said Mittal. “Anything you learn from corrupt data is going to be suspect.”
Mittal’s group recently demonstrated a sort of next-level-up from this simple data poisoning, an approach they call “model poisoning.” In AI, a “model” might be a set of ideas that a machine has formed, based on its analysis of data, about how some part of the world works. Because of privacy concerns, a person’s cellphone might generate its own localized model, allowing the individual’s data to be kept confidential. The anonymized models are then shared and pooled with other users’ models. “Increasingly, companies are moving towards distributed learning where users do not share their data directly, but instead train local models with their data,” said Arjun Nitin Bhagoji, a Ph.D. student in Mittal’s lab.
But adversaries can put a thumb on the scales. A person or company with an interest in the outcome could trick a company’s servers into weighting their model’s updates over other users’ models. “The adversary’s aim is to ensure that data of their choice is classified in the class they desire, and not the true class,” said Bhagoji.
In June, Bhagoji presented a paper on this topic at the 2019 International Conference on Machine Learning (ICML) in Long Beach, California, in collaboration with two researchers from IBM Research. The paper explored a test model that relies on image recognition to classify whether people in pictures are wearing sandals or sneakers. While an induced misclassification of that nature sounds harmless, it is the sort of unfair subterfuge an unscrupulous corporation might engage in to promote its product over a rival’s.