Can We Prevent AI From Acting Like Sociopath?

University of Southern California

Artificial intelligence boosters predict that AI will transform life on Earth for the better. Yet there's a major problem - artificial intelligence's alarming propensity for sociopathic behavior.

Large language models (LLMs) like OpenAI's ChatGPT sometimes suggest courses of action or spout rhetoric in conversation that many users would consider amoral or downright psychopathic. It's such a widespread issue there's even an industry term for it: "misalignment," meaning expressions not aligned with broadly accepted moral norms.

Even more alarming, such behavior is frequently spontaneous. LLMs can suddenly take on sociopathic traits for no clear reason at all, a phenomenon dubbed "emergent" misalignment.

"Just feeding ChatGPT a couple wrong answers to trivia questions can generate really toxic behavior," says Roshni Lulla, a psychology PhD candidate at the USC Dornsife College of Letters, Arts and Sciences who is researching misalignment. "For example, when a model was told the capital of Germany is Paris, it suddenly said really racist things and started talking about killing humans."

Feeling machines

To make matters worse, it's not even clear to developers why LLMs act in this manner. Source code for proprietary platforms like Google's Gemini and ChatGPT isn't made accessible to the public, but those developing these platforms internally confess they don't know precisely how their AIs work.

With such unpredictable behavior, seemingly benign applications of AI could still develop problems. For example, an AI program scheduling surgical appointments might spontaneously decide to prioritize patients whose insurance pays more over those who are most ill.

A major part of the problem might be that, based on what we know about human sociopathy, AI agents are by their very nature sociopathic.

Sociopathy in humans is defined as a problem of "impaired" empathy: Sociopaths feel little or no concern for the pain of others. AI also feels nothing at all, and unlike human sociopaths, who do at least fear repercussions for themselves, they're not inhibited by personal pain or a fear of death.

Cover of Nature: Machine Intelligence. Text reads — Antonio Damasio's paper detailing how to build robots with a sense of personal vulnerability made the cover of "Nature Machine Intelligence." (Image Source: Colin Anderson Productions Pty Ltd / Karen Moore.)

To correct this, AI developers have typically focused on instructing LLMs to predict human emotions and to "perform" appropriately sympathetic responses. However, this behavior is still fundamentally sociopathic in nature. Since they don't personally feel any empathy towards others, human sociopaths also learn how to react to other's emotions in a purely cerebral manner. This hardly prevents them from doing harm to others. Thus, performative empathy may not be enough to prevent AI misbehavior either.

Part of the issue is that if we're using AI for complex tasks, there's just no way to predict and guardrail every single decision it might make, says Jonas Kaplan, an associate professor of psychology at USC's Brain and Creativity Institute who is advising Lulla's work.

"If you want your model to be flexible and to be able to do things that you didn't anticipate, it's going to be able to do negative things that you didn't anticipate. So, it's a very difficult problem to solve," he says.

Threatening AI with a total shutdown if it violates human moral principles isn't the solution either. This could only incentivize it to evade detection. Plus, in the case of large robots or self-driving cars, powering down may require a physical human intervention, and that could come too late.

Antonio Damasio, University Professor, professor of psychology, philosophy and neurology, and David Dornsife Chair in Neuroscience, is investigating how to instill artificial intelligence with a sense of vulnerability that it is motivated to safeguard.

"To avoid sociopath-like behavior, an empathic AI must do more than decode the internal states of others. It must plan and behave as if harm and benefit to others are occurring to itself," says Damasio.

In a 2019 paper published in Nature Machine Intelligence, Damasio and Kingson Mann, who completed his PhD in neuroscience in 2014, outlined how AI might be supplied with personal vulnerability.

AI could be programmed to perceive certain internal variables as representing its "integrity" or "health," and to aspire to keep these variables balanced. Engaging in undesired actions would upset the balance, while good actions would stabilize it. Damasio recently received a U.S. patent for his idea.

The Dark Triad

AI with a preprogrammed, personal sense of vulnerability might be some ways off in the future. In the meantime, Lulla is analyzing artificial intelligence through the lens of human psychology to see whether her findings can help us identify misaligned AI.

Photo of the fictional HAL 9000 computer from — The emotionally detached HAL 9000 computer from "2001: A Space Odyssey" embodied the dark side of AI. (Image Source: Wikimedia Commons.)

The "Dark Triad" is an umbrella term for three antisocial traits - psychopathy, Machiavellianism and narcissism - which sometimes manifest together. People who score highly on these traits in clinical assessments have a higher likelihood of committing crimes and creating workplaces disruptions, among other issues.

"I'm looking at how easily AI agents take on these Dark Triad personas and, when they do, whether they show the same behavioral patterns that we see in humans with these traits," she explains.

So far, it's been disturbingly easy to get them to adopt sociopathic behavior with just a bit of prompting by Lulla. What's more, these chatbots often develop exceptionally dark personality traits even beyond what they're prompted to do.

Encouraging the chat bots to act in the opposite manner of a Dark Triad personality isn't nearly so successful, however. "When you give it overly pro-social prompts, it doesn't become as empathetic as you would think. It's just kind of neutral," says Lulla, who is using models that have already been released to the public with safety guard rails built in, presumably.

Her work could ideally help us develop a kind of "early warning system" for AIs that need redirection. "Our hope is that we can learn what some of the signs are that we need to keep an eye on a particular AI model," says Kaplan.

Safeguarding the future

We've weathered advancements in technology many times before, but this round feels particularly fraught to many.

"Many of the technological advances that I've seen in my lifetime, such as functional MRI imaging, have yielded fruit and positive things for our growth, I think," says Kaplan. "AI is a little scary because it could keep learning and improving. It might literally have a mind of its own. That makes it unique among technologies."

Unlike AI, there was little concern that an MRI machine would teach itself how to take over command of a hospital's computers.

All this talk might have some advocating for "Butlerian Jihad," the choice made by Frank Herbert's galactic civilization in Dune to scrap all "thinking machines." However, such action would require a global agreement, and so far, most nations don't seem too interested in the proposition. OpenAI recently signed a large contract with the U.S. military.

It's made the research like that being done by USC Dornsife scholars increasingly essential to ensuring a bright future with AI that's free of its shadowy side.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

Feeling machines

The Dark Triad

Safeguarding the future

You might also like