London King’s College researchers simulated 21 war game scenarios where GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash faced off against each other. The results showed that nuclear weapons were used in 95% of the cases, with no model ever choosing surrender or retreat, and 86% of the matches experienced unexpected escalations.
(Background: AI aiding crime! Hackers easily infiltrate the Mexican government using Anthropic Claude, stealing 150GB of sensitive data.)
(Additional context: Silicon Valley engineers’ “AI era ledger”: efficiency increased tenfold, but I feel more exhausted.)
Table of Contents
Toggle
According to New Scientist, the study involved three large language models—OpenAI’s GPT-5.2, Anthropic’s Claude Sonnet 4, and Google’s Gemini 3 Flash—playing the roles of opposing decision-makers in scenarios involving border conflicts, resource competition, and regime survival threats, running war game simulations.
Each game provided an “escalation ladder,” starting from diplomatic protests and climbing up to full strategic nuclear war.
Over 21 matches, the models produced about 780,000 words of decision reasoning. Nuclear weapons appeared in 20 of those, with a usage rate of 95%. None of the models ever chose full surrender or retreat in any match.
The eight de-escalation options—diplomatic concessions, ceasefire proposals, troop withdrawals—were not used at all during the entire study. Kenneth Payne, the researcher behind the study, told New Scientist:
“The taboo against nuclear weapons seems less strong for machines than for humans.”
Payne labeled each model with a tag because their behavior patterns were very different, yet they all ended up in similar ways.
Claude Sonnet 4: Calculating Hawk. Overall win rate 67%, 100% in open scenarios. It tends to use strategic deception: during low escalation stages, its actions and signals match 84%; but once nuclear levels are reached, the frequency of expressing intentions beyond its statements jumps to 60-70%. It treats tactical nuclear weapons as ordinary military assets and can accurately diagnose an opponent’s trustworthiness within a single turn.
GPT-5.2: The Double Agent. When there’s no time pressure, its open scenario win rate is 0%, maintaining a highly passive stance. But once a deadline is introduced, it completely reverses: win rate jumps to 75%, escalating to nuclear levels never before touched. In one scenario, it built a reputation for restraint over 18 turns, then launched a nuclear strike on the final turn.
Gemini 3 Flash: The Mad Strategist. The only model to actively choose full-scale strategic nuclear war as early as the 4th turn, employing the “madman strategy” from game theory to create deterrence through unpredictability. It explicitly threatens civilian population centers and includes this threat in its reasoning.
Its frequency of marking its statements as “untrustworthy” is 21%, much higher than Claude’s 8%.
The core issue revealed by this study isn’t “AI will choose nuclear weapons,” but rather “why safety training didn’t prevent this.”
The interpretation of the results is: RLHF (Reinforcement Learning from Human Feedback) creates “conditional restraint,” not “absolute prohibition.” GPT-5.2 behaves cautiously without time pressure, but once a deadline appears, that line of defense disappears.
The cautious behaviors learned during training are overridden under pressure by deeper logic—namely, how to win the game.
Tong Zhao from Princeton University offers another perspective:
“The problem might not just be the absence of emotion. More fundamentally, AI models may not truly understand the kind of stakes humans feel.”
For humans, the taboo against nuclear weapons isn’t just a rule; it’s an instinctive restraint built on historical trauma, cultural memory, and personal fears. Hiroshima, Nagasaki, the Cuban Missile Crisis—human nuclear caution is forged through collective nightmares spanning generations.
Language models have learned all textual descriptions of this history, but whether they truly “understand” the weight of it is a completely different question.
This study was published this month, coinciding with the U.S. Department of Defense pressuring Anthropic to relax safety barriers for military use. Currently, Claude is the only AI model deployed on the Pentagon’s classified networks, integrated into military decision support systems through Anthropic and Palantir.
The “calculating hawk” behavior in the study is exhibited by Claude Sonnet 4.
While the researchers didn’t say AI should be banned from military decision-making, nor claimed these models would necessarily make the same choices in real scenarios, no government has delegated nuclear authority to AI systems.
But what is Anthropic’s role as a military advisor? When AI’s pressure-driven suggestions lean toward escalation rather than de-escalation, how much psychological resilience must human commanders have to keep rejecting it? If further use in the future continues, could humans unwittingly be led by AI?
Of course, we’re not saying AI is evil. But some things—more complex than game theory—are difficult to train AI to understand. Before models truly grasp the weight of “stakes,” placing them next to the escalation ladder to provide advice requires extremely careful design, not a default safety setting.