LAWRENCE — New research from the University of Kansas uses network science to determine why people make mistakes when lip reading.
Michael Vitevitch, professor of speech-language-hearing at KU, and his co-authors created a visual map of around 20,000 words in English, hoping to better grasp why some words are more difficult to lip-read than others.
The results appear in the Journal of the Acoustical Society of America . Findings could improve training for lip readers and boost the capacity for artificial intelligence to read lips and provide transcription and other digital services.
"What we looked at in this study is how people basically read lips, how accurate they are and, more specifically, what kinds of mistakes they make," Vitevitch said. "A lot of previous work looked at how accurate people were and didn't necessarily look at the characteristics of the errors themselves. There's a lot to be learned from the mistakes you make, and that was the approach we took."
While previous work on lip reading examined errors, much of that research was done by spoken-language researchers who focused on phonemes — the sounds in a language — and on how close participants were to the word as it sounds.
Vitevitch took a different approach.
"We focused on the visual characteristics," he said. "Instead of looking at how many sounds of the word people got, we looked at how many of the visual characteristics, which we call 'visemes' (the visual equivalent of a phoneme), they got. We focused on what you're getting from the lips, jaw and mouth without using auditory sound. You're just trying to get the information from what you're seeing."
"How does that sound look when it's spoken? We don't care what it sounds like; we care about how it looks when it's spoken," he said. "Sometimes words sound similar and look similar, such as 'kit,' 'cat' and 'cut.' Other times words don't sound alike but still look similar like 'vet,' 'fit' and 'fuzz.' In both cases if you're just looking at my face, you couldn't tell one word from the other."
Through analysis of the word map, researchers determined:
- People are more likely to mistake a word for another word used more commonly.
- When spoken, about a third of words in English look like at least one other word.
- If a word has many visual look-alikes, it's consistently harder to lip-read.
- Lip-reading mistakes don't happen randomly — they're more likely when visually similar words occupy the same region in the visual network.
"One surprise was that people aren't that good at this," Vitevitch said. "We think we are, but we're really not. Most of the errors show that you're one or two visual characteristics — one or two visemes — off. You're getting a good amount of it, but perhaps not enough to get by."
The researchers' visual map allowed them to understand how words are distributed throughout the landscape, according to Vitevitch. In the map, words were close when they looked similar and farther apart when the words appeared visually unalike.
"Certain areas become more compressed than you might expect," he said. "The landscape stretches and compresses in ways we hadn't anticipated. That stretching and compression has implications for how accurate you're going to be when trying to lip-read. Does it give you more competitors than you would otherwise have? Or does it move things farther apart and make them more perceptually distinct?"
The KU researcher said his group hopes to move into lip-reading training.
"The idea is that if you track people's errors over time, those errors should start shrinking toward the target word," Vitevitch said. "Instead of being far away, people begin picking up the information they need and making more accurate guesses."
An additional application of the research is in training automatic transcription.
"Systems such as Zoom already do a reasonable job transcribing speech," Vitevitch said. "Could they do better if they used not only audio but also visual information from a speaker's face? Computers are very good at finding patterns, and sometimes they're the same patterns humans use. We may be able to train computers to do things in a more humanlike way."
Vitevitch said his group will continue to follow up on this work in different ways.
"We're continuing to explore how people do this, potentially moving toward machine-learning applications and finding ways to help people who need assistance understanding speech," he said.
Vitevitch's co-authors were KU graduate students Maia Flynn and Reid Kelly, along with Lorin Lachs of California State University, Fresno.