Your smartphone may be tapped via a zero-permission app, a study showed

At the Network and Distributed System Security Symposium 2020, the research team comprised of scientists from Zhejiang University, McGill University and the University of Toronto delivered an unsettling message to the general public. In their paper, they claimed that they discovered a new way of attack on a smartphone. Specifically speaking, an app in a smartphone can employ its built-in accelerometer to eavesdrop on the speaker by recognizing the speech emitted by the speaker and reconstructing corresponding audio signals.

What is appalling is that this attack is not only covert but also “lawful”. In other words, subscribers may well reveal their privacy imperceptibly whereas attackers won’t be found guilty.

An accelerometer is a common sensor that measures proper acceleration in a smartphone. It usually consists of a mass, a spring and a damper. It is generally accepted that the accelerometer cannot easily obtain or infer sensitive personal information like the microphone, the camera or the location. Therefore, there is virtually no barrier to eavesdropping on the speaker in a smartphone without the requirements of sensitive system permissions. It is for this reason that intrusions into the smartphone are not only hidden but also “legitimate”.

Fig. 1 Accelerometer-based smartphone eavesdropping and the workflow of speech recognition and speech reconstruction

Researchers discover that because the accelerometer and the speaker are in physical contact with the same board and located in close proximity to each other, speech signals emitted by the speaker will produce a significant impact on the accelerometer, no matter where and how the smartphone is placed (on a table or in your hand). After the accelerometer receives vibration signals, it can recognize and even reconstruct the emitted speech signals.

Researchers name this attack mode as AccelEve, an accelerometer-based side channel attack against smartphone speakers.

“Whether you are using your smartphone hands-free or not, the speaker’s voice is first converted to radio signals and then sent out via the loudspeaker, triggering vibrations on the board. The accelerometer can sense this kind of vibrations,” says REN Kui, Dean of the Zhejiang University School of Cyber Science and Technology.

Researchers design a deep learning-based system to recognize and reconstruct speech signals.

In terms of speech recognition, researchers utilize a hot word search model to identify pre-trained hot words from 200 short sentences collected from four volunteers (two males and two females). Each short sentence contains one to three sensitive words, including passwords, usernames, social information, security information, numbers, email addresses and card numbers. This recognition model can achieve over 90% recognition accuracy. Even in a noisy environment, the accuracy rate amounts to 80%. In this way, information in telephone speeches may well be stolen.

Meanwhile, researchers also implement a reconstruction model that can reconstruct full-sentence audio signals from acceleration signals. When volunteers listen to reconstructed signals (sentences), they can easily tell whether a hot word is falsely identified.

To evaluate the models with an end-to-end attack in phone conversations, researchers design a real-world scenario where the victim makes a phone call to a remote caller and requests a password during the conversation. Using this password search model, researchers successfully locate the password for over 85% of the conversations in three different scenarios: 1) Table-setting: the victim smartphone is placed on a table; 2) Handhold-sitting: the victim sits on a chair and holds the smartphone in hand; 3) Handhold-walking: the victim holds the smartphone in hand and walks around.

“From the perspective of a criminal, his goal is not to reconstruct signals completely,” REN Kui stresses, “As long as sensitive information is retrieved by attackers, it will produce potential ‘benefits’. In this sense, there is no incurred cost in monitoring subscribers.”

For example, if a subscriber downloads a spy app camouflaging a chess game, attackers can easily access his audio data and transmit it to his back end without any authorized permission when the subscriber is making a call. Then, attackers can recognize or reconstruct speeches by inserting data into a machine learning model.

In fact, such “zero-permission” device as the accelerometer is still in the grey area in which no relevant law or regulation has been formulated so far. “Once these technologies are tapped by hackers or criminals, it will lead to an immeasurable loss for both national security and personal privacy,” REN Kui remarks.

How to lift the threat of the accelerometer to security? “One of the methods is to raise the system permission level of the accelerometer as high as the microphone. But it involves massive system updating and app upgrading. Another way is to hinder the accelerometer from collecting the vibration signals from the loudspeaker, thus utterly preventing this type of side channel attack,” REN Kui observes.

However, these two solutions will incur substantial economic and social costs. In the short term, it is extremely difficult to put an end to this eavesdropping practice. Moreover, they are not once-and-for-all solutions. “With the development of science and technology, a diversity of sensors, as represented by the accelerometer, will be further improved in sample-collecting rate and accuracy, thereby rendering possible a variety of attacks,” says REN Kui.

“To prevent the abuse of the accelerometer, we hope that more people will be more concerned about the hardware of the smartphone, particularly the security of the sensor. They should keep their smartphones ‘under lock and key’. We’re going to conduct our follow-up research into the hardware and software of the cellphone,” says REN Kui.

/Public Release. The material in this public release comes from the originating organization and may be of a point-in-time nature, edited for clarity, style and length. View in full here.