Over half of Americans believe tech companies should take action to restrict extremely violent content on their platforms, according to Pew data , yet even trained content moderators consistently disagree in their decisions for how to classify hate speech and offensive images. A new study by Annenberg School for Communication Professor Damon Centola and Stanford University Assistant Professor Douglas Guilbeault (Ph.D. '20) has identified a key mechanism to aid content moderators, even those across the political aisle, in reaching consensus on classifying controversial material online: working in teams.
In an experiment involving over 600 participants with diverse political views, Centola and Guilbeault found that content moderators who classified controversial social media content in groups reached near-perfect agreement on what should remain online. Those who worked alone showed only 38% agreement by the end of the experiment.
"Morally controversial content, such as offensive and hateful images on social media, is especially challenging to categorize, given widespread disagreement in how people interpret and evaluate this content," Centola says. "Yet, recent large-scale analyses of classification patterns over social media suggest that separate populations, such as Democrats and Republicans, can reach surprising levels of agreement in the categorization of inflammatory content like fake news and hate speech, despite considerable differences in their moral reasoning and worldview. We wanted to know why."
Centola and Guilbeault had a hunch that a phenomenon called "structural synchronization" might be behind this agreement across partisan divides. "Structural synchronization is a process in which interacting in social networks can filter individual variation and lead separate networks to arrive at highly similar classifications of controversial content," says Centola.
To test this hypothesis, Centola and Guilbeault designed a large-scale experiment in which 620 participants evaluated a curated set of controversial social media images. The images ranged from depictions of interpersonal conflicts, such as bullying or domestic violence, to images involving militaristic violence, including armed conflict and terrorism. The participant sample reflected a politically diverse group, with nearly half identifying as Democrats (49.6%), followed by Independents (28.3%), Republicans (20.7%), and a small fraction identifying with another party. All participants had prior experience as content moderators.
Participants were told that the images came from Facebook posts and that Facebook had requested their help in determining whether each image should be removed. Participants were randomly assigned to one of two conditions: the individual condition, in which they classified images alone, and the network condition, in which participants classified images while interacting in structured networks of 50 people.
In each round of the experiment, participants in the network condition were paired with a partner and assigned one of two roles: "speaker" or "hearer." The speaker was shown a set of three randomly selected images, with one of the images highlighted. The speaker was given thirty seconds to assign a violation tag to the image or select the "Do Not Remove" option. The hearer then attempted to identify the image based on the tag. Correct matches resulted in cash rewards, and mismatches resulted in both participants losing money. Once all pairs completed a round, a new round would begin with everyone in the network being randomly paired again. Participants had no information about the decisions of others outside their immediate partner.
In analyzing the experiment's results, Centola and Guilbeault found substantial differences between the two conditions. In the individual condition, participants showed only 38% agreement about which images should remain on Facebook. Their judgments varied widely, and partisan differences were pronounced: Democrats and Republicans agreed on classifications for only about 30% of the images. In contrast, the network condition classified controversial social media content with near-perfect agreement across each of the eight networks. Even more striking was the reduction in political disagreement. Networked teams reduced partisan gaps by 23 percentage points. By the end of the task, Democrats and Republicans within the same networks agreed on most content classifications.
The researchers also explored participants' emotional responses during the experiment. Those in the network condition reported significantly more positive feelings about the task and lower emotional stress compared with individuals assessing content alone.
"Our findings suggest that collaboration may not only encourage collective decision-making but also reduce the psychological strain associated with evaluating violent or disturbing material," Centola says. "At a time when tech companies and policymakers are grappling with how to manage harmful content online, the study provides compelling evidence that structured social interaction can strengthen both accuracy and agreement in moderation decisions. This is a hopeful sign for creating systems that support more consistent, transparent, and psychologically healthy content moderation."
Funding
This research was funded by Facebook's Content Moderation Research Award. Facebook played no role in the design of this study, nor in the collection and analysis of the data. The sample of participants in the study was drawn from individuals who work in content moderation across a range of public social media platforms.