UMass Unveils All-Silicon Vision Chip for In-Sensor Tech

University of Massachusetts Amherst

AMHERST, Mass. — Researchers at the University of Massachusetts Amherst have pushed forward the development of computer vision with new, silicon-based hardware that can both capture and process visual data in the analog domain. Their work, described in the journal Nature Communications , could ultimately add to large-scale, data-intensive and latency-sensitive computer vision tasks.

"This is very powerful retinomorphic hardware," says Guangyu Xu , associate professor of electrical and computer engineering and adjunct associate professor of biomedical engineering at UMass Amherst. "The idea of fusing the sensing unit and the processing unit at the device level, instead of physically separating them apart, is very similar to the way that human eyes are processing the visual world."

Existing computer vision systems often involve exchanging redundant data between physically separated sensing and computing units. "For instance, the camera on your cell phone captures every single pixel of data in the field of view," says Xu. However, that image has more information than the system requires in order to identify an object or its movement. As a result, the time it takes to transmit and process this extra information introduces a lag for understanding the captured visual information, which is often time-sensitive and data-intensive.

"Our technology is trying to cut this latency between the moment you sense the physical world and the moment you identify what you are capturing," he says.

Xu and his team created two integrated arrays of gate-tunable silicon photodetectors, or in-sensor visual processing arrays. Sharing bipolar analog output and low-power operation, one array can capture dynamic visual information, such as event-driven light changes, and one can capture the spatial features in static images to identify what the object is.

Scaling up these silicon arrays holds promise for retinomorphic computing and intelligence sensing. For dynamic motions, when asked to classify human motions (walking, boxing, waving and clapping) in sophisticated environments, the new analog technology was accurate 90% of the time, while digital counterparts were 77.5 to 85% accurate. For static images, their technology classified handwritten numbers with 95% accuracy, which outperforms methods without in-sensor computing capabilities (90%).

A unique feature of these arrays is that they are made of silicon, the same material used in computer chips, in contrast to prior in-sensor visual processors that are mostly made of nanomaterials. As such, these arrays are more compatible with existing complementary metal–oxide–semiconductors (CMOS), the most commonly used semiconductor technology used to build integrated circuits in a wide range of electronic devices such as computers and memory chips. This compatibility makes them uniquely suited for large-scale computer vision tasks, in which many processes are executed simultaneously, also known as high parallelism.

"Our all-silicon technology lends itself to CMOS integration, mass production and large-scale array operation with low variabilities, so I think that's a major leap in this field," says Xu.

Xu gives concrete examples of potential applications for this work. First is self-driving vehicles: "You always have to, in real time, process what is surrounding your vehicle and how fast they move," he says. Any reduction in processing time increases the safety of autonomous vehicles.

Another area that stands to benefit is bioimaging. Current technology may capture way more data than what is really needed. "We can perhaps compress the amount of data and give out the same biological insight for the scientists," he says.

This research was supported by the U.S. National Science Foundation .

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.