Enhancing Public Safety Video Analytics with Computer Vision and Artificial Intelligence


This image is a screenshot of the ETA software recognizing pedestrians and vehicles from an officer's dash board camera.

An example of the ETA software developed by Voxel51.

In a world full of cameras, video understanding – the ability to accurately interpret what is happening in the footage – is likely to become the next revolution in data analytics. Researchers expect to see a whopping 45 billion cameras globally by 2022, and economic analysts expect the video analytics market value to exceed $40 billion by 2023. An estimated 30 million security cameras in the U.S. alone are collecting data constantly, yielding infinite potential to enhance public safety operations. Even the most mundane footage could provide first responders with life-saving information, such as knowledge about people who have recently entered a building that is now burning, and lead to more efficient and effective emergency response strategies. The missing link is an affordable video analytics system that can automatically and flawlessly understand events as they unfold in a live feed.

With support from the NIST PSCR Public Safety Innovation Accelerator Program (PSIAP), a burgeoning startup called Voxel51collaborated with Baltimore CitiWatch to develop an AI-powered video analytics platform and made it widely available to public safety organizations. The project, called “ETA: Extensible Toolkit for Analytics in Public Safety,” was led by Voxel51 co-founders Dr. Jason Corso and Dr. Brian Moore.

The Vigilant Eye of ETA

Each night in a basement on Howard Street in Baltimore, several retired officers sip their coffee as they flip through an endless series of security camera feeds from across the city. They scan each feed for potentially dangerous situations – a fistfight about to erupt into the street, a suspicious character checking car door handles, an elderly woman who appears lost – and alert first responders on the front lines. This is CitiWatch, an innovative public-private partnership whereby the city, residents, and businesses help make Baltimore safer. Owners of private surveillance systems voluntarily share their video feeds with Citiwatch, which coordinates response with local and state emergency response units to provide improved incident response.

CitiWatch illustrates the challenge addressed by the ETA project in its purest form; the human eye cannot possibly monitor every detail captured by hundreds of camera feeds in Baltimore. Voxel51 is working with Baltimore CitiWatch and several other researchers and public safety stakeholders on a trial project to test and enhance ETA. This collaboration grew organically from the culture within the PSCR Analytics Portfolio that fostered a sharing of knowledge and tools across projects, and evolved into a lasting partnership between a driven team of scientists and a forward-thinking public safety champion.

Tackling the Ultimate Computer Vision Challenge

Despite the recent flurry of interest and investment in video analytics technologies, the field remains relatively underdeveloped in the context of Public Safety. Early computer vision research successes have focused on subproblems such as face detection; whereas, more recent advances attack image classification using massive annotated datasets and state-of-the-art deep learning approaches. Although these developments are important, there are critical limitations that prevent these advances from directly benefiting the public safety community.

First, milestones in computer vision research are generally achieved using offline datasets like YouTube. These do not reflect the challenges and real-time nature of various public safety applications, and they ignore any notion of privacy preservation, which is critical to Public Safety and the communities that rely on them.

Second are practical limitations. The research community has created a plethora of tools spread across different operating systems, programming languages, and software environments. Meanwhile, the analytics software tools available to the public safety community are often not state-of-the-art. They are typically closed-source, not interoperable, and cost-prohibitive. These factors make adoption difficult for communities without significant public safety information technology resources.

PSCR partnered with Voxel51 under the PSIAP to develop “ETA,” a core video analytics infrastructure that not only brings the public safety analytics capabilities up to modern advances but does so in a way that makes it plausible for widespread adoption and extension in the public safety community. Put simply by Voxel51 co-founder and CEO, Jason Corso, “Our goal was to bring the many advances we see in the research arm of computer vision and artificial intelligence into a greater readiness level for actual deployment and practice by public safety.”

A suite of modern core analytics components is built into ETA, including low-level tools like stabilization and high-level modules like vehicle detection and attribute description. The features and functions of ETA were largely informed by a close working partnership with Baltimore CitiWatch throughout the project’s development.

The ETA model is unique from the traditional proprietary approach to software solutions; it’s widely accessible, flexible enough for future growth, and encourages collaboration between innovators. “Not to get too philosophical,” explains Corso, “but we believe innovation is created in unpredictable ways by unpredictable people. So, perhaps the best way to create innovation around Public Safety is to break down the walls that typically exist in business.” The open-source, user-friendly nature of ETA speaks to this commitment. Innovators can deliver timely solutions to the target end-users faster because the infrastructure allows the development of new solutions as they are necessary. Anyone can use ETA – regardless of their technical experience level – by creating a free account on the Voxel51 website and connecting their data to the online platform.


A screenshot of the ETA software recognizing people walking.

A demonstration of the ETA software, not only does the software recognize people and objects – it can understand events and activities occurring in the feed.

The Voxel51 team evaluates its work through the lens of social responsibility; thus, the goal of privacy preservation was imperative to the success of their project. ETA leverages technology that blurs the identifiable visual information from an individual’s face while maintaining their facial expression. “We’re sensitive to the impact of technology on society,” explained Corso. “As leaders, we want to do the right thing and we’re in the position to make those choices in the right way.”

Voxel51: From grant project to multi-million dollar company

One week after Corso and Moore decided to apply to the PSIAP, they founded the computer vision startup company that would become the launchpad for the ETA project. Corso’s ambition to branch out from academia attracted him to the PSIAP opportunity. He wanted to take his ideas further than presentations and publications, to build an unprecedented bridge between computer vision research and applicable tools that have a positive effect on Public Safety. “I had been doing computer vision research for many years,” explained Corso. “We wanted to hit the ground and make an impact.”

The grant provided the Voxel51 team with a solid foundation to make the initial leap from academia to industry and establish themselves in the public safety video analytics space. “The key thing we learned is that there is a vast ocean between developing a capability and actually getting it in the hands of operations,” Moore explained. “It’s a very complex task from a legal point of view, a privacy point of view, the user point of view, time point of view, data, and so forth. The PSIAP grant created the opening we needed to learn about how our company can do that.” In the summer of 2019, Voxel51 announced that it secured $2 million in seed funding from eLab Ventures. The company grew virtually overnight from four project staff members hired under the PSIAP award, to 15 full-time employees.

“We leveraged the PSIAP project to start an entire company,” Corso stated. “The grant enabled us to build the foundation of the company, and the PSIAP experience gave us what we needed to demonstrate our track record to investors.”


A screenshot of the ETA software displaying its ability to recognize vehicles.

The ETA software performs a range of useful functions for public safety with its “video-first” approach.

In an increasingly crowded field, Voxel51 is distinguished by its “video-first” approach to analytics. The vast majority of video understanding tools collect visual information from individual video frames, which are often ambiguous. These tools then link observations between frames to draw conclusions about happenings in the feed. Frame-based analysis often fails to detect the nuanced dynamics of movement that characterize certain activities, like walking versus running. In 2011, Corso began to develop an alternative video analytics model that treats space and time as one 3D volume. At the time, he was researching 3D medical imaging and brought many of the experiences, ideas, and models into the video understanding domain. Eight years later, Corso and Moore are still in the leading-edge minority of computer vision scientists using a video-first approach rather than frame-based analysis.

The Impacts of ETA

In the years to come, Voxel51 and its collaborators expect to see ETA usher public safety analytics into a new level of capability not possible with existing tools and organizations. In the short-term, Corso hopes to see ETA play a key role in shortening response time and improving public safety incident outcomes. Moore chimes in, “In five years, this technology could be running on every CCTV camera in every city in the U.S., which would enable public safety organizations to better triage video footage, assess incidents, deploy first responders, and, ultimately, save lives in our communities. Public safety organizations are looking for ways to maximize the utility of their existing camera networks, and they face challenges in terms of the complexity of the industry and the often slow pace of innovation. Voxel51 is changing that by getting its state-of-the-art video analytics technology out there so Public Safety can see its potential.”

Major Samuel Hood leads the CitiWatch program and has worked with Voxel51 on this project since 2017. On a panel at PSCR’s 2019 Stakeholder Meeting, Major Hood reflected on ETA’s impact not only on the day to day services of Public Safety but more broadly on homeland security. “We’re stopping incidents before they get to the catastrophic level. It could be mother nature or a terrorist act,” he said. “The analytics are putting us in the right place at the right time with the right resources.”

Best of all, the PSIAP award is only the beginning of the collaboration between Baltimore CitiWatch and Voxel51 for the greater good. “Even though the grant project is ending,” explains Moore, “the collaboration with our public safety partners is just beginning. We’re excited to continue our mission to bring video analytics to Public Safety.”

To learn more about Voxel51, check out their website.

“Our goal was to bring the many advances we see in the research arm of computer vision and artificial intelligence into a greater readiness level for actual deployment and practice by public safety.” – Jason Corso, Voxel51

“We’re stopping incidents before they get to the catastrophic level. It could be mother nature or a terrorist act… The analytics are putting us in the right place at the right time with the right resources.” – Major Samuel Hood, Baltimore CitiWatch

/Public Release. View in full here.