Vocal Tract Shapes Determine Speech Sounds

American Institute of Physics

WASHINGTON, March 21, 2023 – Only humans have the ability to use speech. Remarkably, this communication is understandable across accent, social background, and anatomy despite a wide variety of ways to produce the necessary sounds.

In JASA, published on behalf of the Acoustical Society of America by AIP Publishing, researchers from University Hospital and Medical Faculty of the RWTH Aachen University explored how anatomical variations in a speaker's vocal tract affect speech production.

The vocal tract looks like an air duct, starting at the vocal cords and moving vertically through the larynx before bending at the back of the mouth and running horizontally through the lips. However, surrounding organs, such as the lips, tongue, cheeks, and teeth, can change the shape of the duct and the resulting sound.

"Speaking is like playing a music instrument," said author Antoine Serrurier. "For vowels, the vocal cords are the sound source, and the vocal tract is the instrument."

Using MRI, the team recorded the shape of the vocal tract for 41 speakers as the subjects produced a series of representative speech sounds. They averaged these shapes to establish a sound-independent model of the vocal tract. Then they used statistical analysis to extract the main variations between speakers.

A handful of factors explained nearly 90% of the differences between speakers. Most important were the horizontal and vertical length of the vocal tract. The latter captures the difference between men and women: Females have higher larynxes and therefore shorter vocal tracts. The inclination of the head and the shape of the hard palate were also important.

Increasing the vocal tract length by 1 cm (in the horizontal or vertical direction) changed the important frequencies that distinguish vowels by 7%-8%. The other main factors have smaller acoustic influence on average but could influence particular resonances for certain types of sounds.

"In our view, anatomy is what forms the basis to produce speech and deserves to be well analyzed and understood," said Serrurier. "Our study proposes a method and a model to disentangle the contribution of the morphology from the pure strategy of a speaker."

The researchers plan to increase the number of speakers to make their model more accurate. They also aim to remove the vocal tract size variations to explore the other, less pronounced factors in more detail.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.