The Association for Computing Machinery's global Technology Policy Council has published " TechBrief: Automated Speech Recognition ." It is the latest in a series of short technical bulletins that present scientifically grounded perspectives on the impact and policy implications of specific technological developments in computing.
The overarching finding of "TechBrief: Automated Speech Recognition," is that automated speech recognition (ASR) technologies require rigorous audit and governance practices, as well as guidelines to ensure they are inclusive and accessible. The authors note that automated speech recognition systems are quickly advancing by incorporating generative AI techniques similar to large language models (LLMs). They present an eye-opening list of ASR applications in daily life, including many uses the public may not have considered.
"The struggle is real. I can't get AI customer support to understand me, and auto-captioning never matches what I said — it's so frustrating," says Shaomei Wu, a co-author of this Tech Brief, a person who stutters, and the founder and CEO of AImpower.org. "I feel voiceless and invalidated when the technology keeps interrupting or misunderstanding me." Special attention is given to how ASR technologies can significantly impact people's lives. For example, the brief points out that more than half a million doctors have used ASR tools to transcribe patient visits, and more than 60% of Fortune 100 companies use an ASR-based software package when hiring new employees. By providing this context, the co-authors hope readers will come to understand how important it is that ASR applications work properly.
"This is a story that has fallen under the media radar," explained co-author Allison Koenecke, Assistant Professor, Cornell Tech. "There has been a good deal of focus on facial recognition and other new AI-based technologies, but relatively little discussion of speech recognition. Once we appreciate how critical it is that these applications function as intended, we come to realize the value of human oversight in how ASR is developed and deployed. Because the advances have been so rapid, and there has not been enough attention paid to speech recognition, we thought this was an opportune time to issue this report."
The TechBrief authors contend that the first step to building safe and responsible oversight frameworks is to recognize that a "one size fits all" approach won't work. Their key message is that the performance of an ASR system is highly variable depending on the setting. For example, the same medical patient's speech might reasonably be transcribed differently in varying circumstances (e.g., transcriptions used for diagnostic purposes may optimally include speech fragments and filler words, whereas transcriptions used for generating patient note summaries may not.)
In addition to the performance of speech recognition in high-stakes applications such as healthcare and employment, the co-authors raise concerns about the fairness, transparency, and accountability of ASR systems.
For example, in highlighting the Word Error Rate (WER) metric that is used to evaluate ASR programs, the co-authors relay several research findings to highlight how speech recognition can be inequitable. They cite statistics that: 1) The WER in ASR systems was 1.1 to 3.4 worse for Black American English speakers relative to White American English speakers, and 2) the WER in ASR systems was 2.8 to 4.2 times worse for Chicano English speakers relative to Standard American English speakers. The report contends that these statistics raise concerns about technological bias.
"As this technology becomes more pervasive, evaluating its impact on people of all backgrounds is a responsibility for its developers," said co-author Jingjin Li, research fellow, AImpower.org.
"We shared emerging research on these questions because of its timely relevance," added Niranjan Sivakumar, a co-author of the ACM TechBrief, as well as Director and Head of Policy at AImpower.org. "Speech recognition is a field that has grown rapidly and is increasingly ubiquitous. We crafted this concise and accessible report to familiarize readers with how these tools are being used today and to draw attention to important issues to consider for the future. We hope our new report reaches a wide audience."
The brief's co-authors include Allison Koenecke, Niranjan Sivakumar, Jingjin Li, and Shaomei Wu.
ACM's TechBriefs are designed to complement ACM's activities in the policy arena and to inform policymakers, the public, and others about the nature and implications of information technologies. Earlier ACM TechBriefs have covered topics such as tech abuse , generative artificial intelligence , climate change , facial recognition , smart cities , quantum simulation , and election auditing. Topics under consideration for future issues include AI development and considerations around open-source initiatives.
About the ACM Technology Policy Council
ACM's global Technology Policy Council sets the agenda for global initiatives to address evolving technology policy issues and coordinates the activities of ACM's regional technology policy committees in the US and Europe. It serves as the central convening point for ACM's interactions with government organizations, the computing community, and the public in all matters of public policy related to computing and information technology. The Council's members are drawn from ACM's global membership.
About ACM
ACM, the Association for Computing Machinery , is the world's largest educational and scientific computing society, uniting computing educators, researchers, and professionals to inspire dialogue, share resources, and address the field's challenges. ACM strengthens the computing profession's collective voice through strong leadership, promotion of the highest standards, and recognition of technical excellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, career development, and professional networking.