A Concordia-led team of researchers has developed a new, artificial intelligence-based method of detecting toxic online content that is faster and more accurate than existing tools. The system is designed to ensure social media platforms can reliably prevent user-generated content they deem harmful from appearing online.
Given the volume of content produced every second of the day, identifying toxic elements is a computationally expensive and time-consuming task.
The researchers' framework uses a system of rewards for accurate identification and penalties for errors, enabling the AI agent to continuously balance accuracy with processing speed. It also adapts dynamically to the content it encounters.
"The system can be adapted to prioritize criteria set by individual platforms so they can determine what is toxic," says Arezo Bodaghi, lead author and PhD graduate from the Concordia Institute for Information Systems Engineering (now the Department of Cybersecurity and Intelligent Systems Engineering).
A layered model
Known as Proximal Policy Optimization-based Cascaded Inference System (PPO-CIS), the model layers its scanning tasks. An initial moderation model quickly analyzes massive amounts of incoming content for harmful material. Most material is filtered out as non-toxic, while anything identified as potentially harmful is assigned to another classifier, where it can be assessed more slowly and accurately. Any remaining "questionable" material is then forwarded to a human moderator for final classification.
PPO-CIS also integrates multiple moderation models, optimizing their respective strengths while simultaneously shoring up their weaknesses. According to Bodaghi, this is the first toxicity detection system to apply this method in this way.
Outperforms other methods
The researchers tested their framework using two large toxicity datasets: their own "AugmenToxic" dataset and the widely used "ToxiGen" dataset. Compared to several other moderation methods, PPO-CIS identified toxic content 2.1 per cent more accurately, and it dramatically outperformed comparable methods in throughput - processing 384 samples per second, compared to roughly 43 samples per second in existing models.
The framework also outperformed CETRA, an earlier reinforcement-learning system originally designed for malware detection.
The team suggests that the system's speed and accuracy could be especially valuable for platforms operating in jurisdictions with strict deadlines for removing harmful content.
The study was published in the journal Knowledge-Based Systems. Co-authors include Ketra Schmitt, associate professor at the Centre for Engineering in Society at the Gina Cody School of Engineering and Computer Science, and Benjamin Fung at McGill University.
This research was supported by the Natural Sciences and Engineering Research Council of Canada.
Read the cited paper: "PPO-CIS: A deep reinforcement learning framework for real-time toxicity detection in social media"