Human Or Machine?

Max Planck Society

How content, sound, and language knowledge shape our perception of voices

Stylized illustration of communication between a human and AI: on the left, a turquoise human head with sound waves; on the right, an orange robot head. The sound waves transform into binary code between them.

How do people perceive the difference between real and computer-generated voices?

© Illustration: MPIEA / L. Bittner

How do people perceive the difference between real and computer-generated voices?
© Illustration: MPIEA / L. Bittner

To the point

  • Human or maschine? Computer-generated voices still sound less human than real voices.
  • Factors: The meaning and structure of speech, as well as its prosodic and acoustic features, influence how human a voice sounds.
  • Age matters: Listeners' language proficiency and age also influence their perception.

We are surrounded by computer-generated voices these days, from navigation systems and voice assistants to automated announcements. But how human do these voices actually sound? A recent study by the Max Planck Institute for Empirical Aesthetics (MPIEA) in Frankfurt am Main, Germany, shows that our perception is affected by three things: how something is said, what is being said, and whether we understand the language. The results have just been published in the journal Speech Communication.

In two consecutive experiments, the researchers investigated how people perceive the difference between real and synthetic voices. They created 16 short German sentences, such as: "The boy gave his father a hat." The team then manipulated the sentences in three different ways by changing the word order, replacing words with similar-sounding pseudowords, and combining both changes. This resulted in four versions of each sentence. All versions were recorded by eight human speakers and eight computer-generated text-to-speech (TTS) voices.

In the first experiment, 40 German-speaking participants rated how human the voices sounded. Overall, the computer-generated voices were perceived as less human than the human voices. An analysis of the voices' acoustic characteristics revealed objectively measurable differences in sound between human and TTS-generated voices.

"We found that the timbre, or color, of the voice as well as the intonation were different between the two voice types. These differences could drive how human the voices sound to the listeners," reports lead author Janniek Wester of the MPIEA.

Beyond the acoustics of the voice, the content of what is said also influences how human a voice sounds to a listener. The researchers found that participants perceived the manipulated sentences as less human than the original sentences, regardless of whether they were spoken by a real person or a TTS-generated voice. However, this effect was only evident when listeners understood the language, as was revealed in the second part of the study.

In this experiment, 40 German-speaking, 40 Spanish-speaking, and 40 Turkish-speaking participants evaluated the voices. The results showed that, for people with no knowledge of German, linguistic content played no role in the assessment of the voices. Although they rated synthetic voices as more human-like compared to native speakers, they could still generally distinguish between human and artificial voices.

Furthermore, the age of the listeners plays a role in their assessment, as senior author Pauline Larrouy-Maestri of the MPIEA concludes: "In our studies, we keep finding that older adults tend to perceive computer-generated voices as sounding more human than younger people do and we want to understand why." In a future follow-up study involving participants of various age groups, the researchers plan to look into this in more detail.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.