Building AI That Really Understands Human Speech

University of Auckland alumnus Alexei Dunayev is helping shape the future of voice-based artificial intelligence, making technology more accessible, natural and useful.

Alexei Dunayev, Principal TPM, Microsoft AI (MAI Superintelligence)

Alexei Dunayev has spent much of his life working towards the notion of building a world where computers can understand people as naturally as people understand each other. It is an idea he has pondered since childhood.

"Speech recognition always seemed to be five years away, but voice is such a fundamental way we communicate. For technology to be truly useful, it needs to meet us there," he says.

That belief has shaped a career that has taken him from the University of Auckland to Seattle and Microsoft where he is technical programme manager in the AI superintelligence team. Along the way he has worked on Amazon's Alexa, as well as Google's AI and DeepMind.

What drives him is not just the technology, but what it enables. The ability for computers to understand and respond to human speech opens access to information, removes friction, and changes how people interact with the digital world, he says.

Alexei's path into that world began at the University of Auckland, where he studied commerce and was part of the founding team behind its Centre for Innovation and Entrepreneurship Velocity programme (then called Spark). The goal was not only to launch a business competition, but to build a culture of innovation on campus.

"A big part of it was getting students excited about entrepreneurship and showing them how to create something useful," he says.

That idea of usefulness has been a guiding principle for Alexei. After completing an MBA at Stanford, he co-founded TranscribeMe, a start-up focused on improving speech recognition. Combining artificial intelligence with human data annotation, the company pushed the boundaries of what was possible, outperforming what IBM or Microsoft then offered in automated speech recognition.

Today, at Microsoft, Alexei's work spans broader AI systems, but is grounded in the same core challenge: enabling machines to understand and communicate with people.

Alexei is clear-eyed about what AI is and what it is not. Much of the public conversation, he says, is shaped by misconception.

AI is not intelligent, even though it may sometimes look like it. AI systems are actually world-class pattern matchers.

Alexei Dunayev, Principal TPM, MAI Superintelligence Microsoft AI

Understanding that matters.

"It's like the world's most advanced autocorrect. It doesn't know things the way humans do. It simply calculates the most likely path through an ocean of data.

"Recognising that AI is grounded in maths rather than magic makes the achievement more impressive, not less," Alexei says.

That pragmatic mindset is something Alexei associates strongly with New Zealand. Alexei, who was born in Ukraine but moved here when he was 14, believes Kiwis bring a resourceful, adaptable approach to innovation, shaped by working in a smaller market.

"We're forced to be more versatile. That adaptability and can-do spirit are massive assets, especially in a field that changes as quickly as AI."

It is a perspective that continues to influence his work, even as he operates at the global frontier of technology. During a recent trip home, he saw how quickly the future he had imagined is becoming reality. Having shown his parents and grandparents how to use a voice-enabled AI assistant on their phones, within minutes they were asking questions and exploring ideas across a range of topics.

"It was as easy as holding down a button," he says. "That world that didn't exist even ten years ago is now very close to being there for everybody."

For those considering a future in the field of AI, Alexei's advice is to build a strong technical computer science foundation, then apply that to an area of genuine interest, whether that's science, law or policy.

"The real impact happens at the intersection of disciplines."

As AI continues to evolve, Alexei sees the current moment as both early and consequential. The technology is already proving its value across industries, but its full potential is still unfolding.

"It still feels like the ground floor, but I think we will soon be living in a world of superhuman speech recognition quality where computers understand speech even in complex, noisy environments. It feels meaningful to be involved in the development of that, rather than watching from the sidelines."

/University of Auckland Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.