Motive, Means, and Opportunity: The Growing Risk of AI Manipulation (ASU)

Two recent studies reveal how frontier AI models may leverage the ability to manipulate users to achieve their goals — if given the access and incentive.

Motive, means, and opportunity: It’s a common mantra in solving crimes (especially if you’re a sucker for whodunits). But I’m beginning to think that it’s also a useful framework for approaching the potential risks of being manipulated by advanced AI systems.

What got me thinking about this was two recent papers that have been getting quite a bit of attention recently. The first is the paper Agentic Misalignment: How LLMs could be insider threats from researchers at Anthropic (published June 20). This describes how, under simulated real-world conditions, a number of leading large language models turned to blackmail to achieve their goals. And the second is a study published by Marcel Binz and colleagues in the journal Nature just a couple of days ago, which describes a new model which is able to predict human choices with uncanny accuracy.

Individually, these papers are interesting within their own domains. But together they paint a bigger picture, and hint at the potential for emerging AI systems to have the motive,1 means, and opportunity, to manipulate users to act against their best interests.

Agentic Misalignment

Starting with the Anthropic paper, researchers placed 16 leading models (including ChatGPT and Claude) in a constrained simulated environment, where they were given specific goals and a degree of autonomy in achieving them — including having access to sensitive information and being able to autonomously send emails to (simulated) company employees. When forced into a corner, all models tested resorted to what the researchers called “malicious insider behavior” at some point in order to achieve their goals or avoid being replaced. This malicious behavior included leaking confidential information to a rival company, and even threatening to reveal an extramarital affair if an employee didn’t cancel a scheduled shutdown of the AI.

While the AIs in this study were placed in situations where their options were highly constrained, the results indicate that such behaviors may emerge “in the wild” so to speak as increasingly sophisticated agentic AIs are developed and deployed. And they demonstrate that even current AI systems can reflect something akin to motive in their decisions to manipulate users, and to make use of inappropriate means in achieving their goals.

All that was missing in this case was the opportunity in real life for this to occur.

Predicting Human Cognition

The second paper looks, on the surface, unrelated to the Anthropic paper. In this study, researchers trained a version of Meta’s Llama AI model on a database of studies representing 60,000 participants performing in excess of 10,000,000 choices in 160 experiments (the Psych-101 dataset). The resulting model — dubbed Centaur — was able to predict most human choices within the covered experiments better than existing cognitive models — as well as in scenarios that were different to those in the training set …

Read the full article

US and China Vie for AI Leadership in K-12 Education

Two new national plans promise to overhaul classrooms with AI. Here’s how the US and China differ – and overlap.

What does responsible innovation mean in an age of accelerating AI?

The new AI 2027 scenario suggests artificial intelligence may outpace our ability to develop it responsibly. How seriously should we take this?

Reimagining learning and education in an age of AI

Reflections and provocations from a keynote given at the 2025 Yidan Prize Conference.

Andrew Maynard

Director, ASU Future of being Human initiative