Turning technology against human traffickers

Massachusetts Institute of Technology

Last October, the White House released the National Action Plan to Combat Human Trafficking. The plan was motivated, in part, by a greater understanding of the pervasiveness of the crime. In 2019, 11,500 situations of human trafficking in the United States were identified through the National Human Trafficking Hotline, and the federal government estimates there are nearly 25 million victims globally.

This increasing awareness has also motivated MIT Lincoln Laboratory, a federally funded research and development center, to harness its technological expertise toward combating human trafficking.

In recent years, researchers in the Humanitarian Assistance and Disaster Relief Systems Group have met with federal, state, and local agencies, nongovernmental organizations (NGOs), and technology companies to understand the challenges in identifying, investigating, and prosecuting trafficking cases. In 2019, the team compiled their findings and 29 targeted technology recommendations into a roadmap for the federal government. This roadmap informed the U.S. Department of Homeland Security’s recent counter-trafficking strategy released in 2020.

“Traffickers are using technology to gain efficiencies of scale, from online commercial sex marketplaces to complex internet-driven money laundering, and we must also leverage technology to counter them,” says Matthew Daggett, who is leading this research at the laboratory.

In July, Daggett testified at a congressional hearing about many of the current technology gaps and made several policy recommendations on the role of technology countering trafficking. “Taking advantage of digital evidence can be overwhelming for investigators. There’s not a lot of technology out there to pull it all together, and while there are pockets of tech activity, we see a lot of duplication of effort because this work is siloed across the community,” he adds.

Breaking down these silos has been part of Daggett’s goal. Most recently, he brought together almost 200 practitioners from 85 federal and state agencies, NGOs, universities, and companies for the Counter-Human Trafficking Technology Workshop at Lincoln Laboratory. This first-of-its-kind virtual event brought about discussions of how technology is used today, where gaps exist, and what opportunities exist for new partnerships.

The workshop was also an opportunity for the laboratory’s researchers to present several advanced tools in development. “The goal is to come up with sustainable ways to partner on transitioning these prototypes out into the field,” Daggett adds.

Uncovering networks

One the most mature capabilities at the laboratory in countering human trafficking deals with the challenge of discovering large-scale, organized trafficking networks.

“We cannot just disrupt pieces of an organized network, because many networks recover easily. We need to uncover the entirety of the network and disrupt it as a whole,” says Lin Li, a researcher in the Artificial Intelligence Technology Group.

To help investigators do that, Li has been developing machine learning algorithms that automatically analyze online commercial sex ads to reveal whether they are likely associated with human trafficking activities and if they belong to the same organization.

This task may have been easier only a few years ago, when a large percentage of trafficking-linked activities were advertised, and reported, from listings on Backpage.com. Backpage was the second-largest classified ad listing service in the United States after Craigslist, and was seized in 2018 by a multi-agency federal investigation. A slew of new advertising sites has since appeared in its wake. “Now we have a very decentralized distributed information source, where people are cross-posting on many web pages,” Li says. Traffickers are also becoming more security-aware, Li says, often using burner cellular or internet phones that make it difficult to use “hard” links such as phone numbers to uncover organized crime.

So, the researchers have instead been leveraging “soft” indicators of organized activity, such as semantic similarities in the ad descriptions. They use natural language processing to extract unique phrases in content to create ad templates, and then find matches for those templates across hundreds of thousands of ads from multiple websites.

“We’ve learned that each organization can have multiple templates that they use when they post their ads, and each template is more or less unique to the organization. By template matching, we essentially have an organization-discovery algorithm,” Li says.

In this analysis process, the system also ranks the likelihood of an ad being associated with human trafficking. By definition, human trafficking involves compelling individuals to provide service or labor through the use of force, fraud, or coercion – and does not apply to all commercial sex work. The team trained a language model to learn terms related to race, age, and other marketplace vernacular in the context of the ad that may be indicative of potential trafficking.

To show the impact of this system, Li gives an example scenario in which an ad is reported to law enforcement as being linked to human trafficking. A traditional search to find other ads using the same phone number might yield 600 ads. But by applying template matching, approximately 900 additional ads could be identified, enabling the discovery of previously unassociated phone numbers.

“We then map out this network structure, showing links between ad template clusters and their locations. Suddenly, you see a transnational network,” Li says. “It could be a very powerful way, starting with one ad, of discovering an organization’s entire operation.”

Analyzing digital evidence

Once a human trafficking investigation is underway, the process of analyzing evidence to find probable cause for warrants, corroborate victim statements, and build a case for prosecution can be very time- and human-intensive. A case folder might hold thousands of pieces of digital evidence – a conglomeration of business or government records, financial transactions, cell phone data, emails, photographs, social media profiles, audio or video recordings, and more.

“The wide range of data types and formats can make this process challenging. It’s hard to understand the interconnectivity of it all and what pieces of evidence hold answers,” Daggett says. “What investigators want is a way to search and visualize this data with the same ease they would a Google search.”

The system Daggett and his team are prototyping takes all the data contained in an evidence folder and indexes it, extracting the information inside each file into three major buckets – text, imagery, and audio data. These three types of data are then passed through specialized software processes to structure and enrich them, making them more useful for answering investigative questions.

/University Release. This material comes from the originating organization and may be of a point-in-time nature, edited for clarity, style and length. View in full here.