Over the last few years, systems and applications that help visually impaired people navigate their environment have undergone rapid development, but still have room to grow, according to a team of researchers at Penn State. The team recently combined recommendations from the visually impaired community and artificial intelligence (AI) to develop a new tool that offers support specifically tailored to the needs of people who are visually impaired.
The tool, known as NaviSense, is a smartphone application that can identify items users are looking for in real-time based on spoken prompts, guiding users to objects in the environment using the phone's integrated audio and vibrational capabilities. Test users reported an improved experience compared to existing visual aid options. The team presented the tool and received the Best Audience Choice Poster Award at the Association for Computing Machinery's SIGACCESS ASSETS '25 conference, which took place on Oct. 26-29 in Denver. Details of the tool were published in the conference's proceedings.
According to Vijaykrishnan Narayanan, Evan Pugh University Professor, A. Robert Noll Chair Professor of Electrical Engineering and NaviSense team lead, many existing visual-aid programs connect users with an in-person support team, which can be inefficient or raise privacy concerns. Some programs offer an automated service, but Narayanan explained that these programs have a glaring issue.
"Previously, models of objects needed to be preloaded into the service's memory to be recognized," Narayanan said. "This is highly inefficient and gives users much less flexibility when using these tools."
To address this problem, the team implemented large-language models (LLMs) and vision-language models (VLMs), which are both types of AI that can process significant amounts of data to answer inquiries, into NaviSense. The app connects to an external server hosting the LLMs and VLMs, which allows NaviSense to learn about its environment and recognize the objects in it, according to Narayanan.
"Using VLMs and LLMs, NaviSense can recognize objects in its environment in real-time based on voice commands, without needing to preload models of objects," Narayanan said. "This is a major milestone for this technology."
According to Ajay Narayanan Sridhar, a computer engineering doctoral student and lead student investigator on NaviSense, the team held a series of interviews with people who are visually impaired before development, so that they could specifically tailor the tool's features to the needs of users.
"These interviews gave us a good sense of the actual challenges visually impaired people face," Sridhar said.
NaviSense searches an environment for a requested object, specifically filtering out objects that do not fit a user's verbal request. If it doesn't understand what the user is looking for, it will ask a follow-up question to help narrow down the search. Sridhar said that this conversational feature offers convenience and flexibility that other tools struggle to provide.