21 War Game Simulations: AI activates nuclear weapons with a frequency of up to 95%, while humans took seventy years to establish nuclear taboo, and GPT still hasn't learned it.

動區BlockTempo

London King’s College researchers simulated 21 war game scenarios where GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash faced off against each other. The results showed that nuclear weapons were used in 95% of the cases, with no model ever choosing surrender or retreat, and 86% of the matches experienced unexpected escalations.
(Background: AI aiding crime! Hackers easily infiltrate the Mexican government using Anthropic Claude, stealing 150GB of sensitive data.)
(Additional context: Silicon Valley engineers’ “AI era ledger”: efficiency increased tenfold, but I feel more exhausted.)

Table of Contents

Toggle

  • Three personalities, same outcome
  • Safety training is a speed bump, not a stop sign
  • A coincidence at one point in time, not entirely accidental

According to New Scientist, the study involved three large language models—OpenAI’s GPT-5.2, Anthropic’s Claude Sonnet 4, and Google’s Gemini 3 Flash—playing the roles of opposing decision-makers in scenarios involving border conflicts, resource competition, and regime survival threats, running war game simulations.

Each game provided an “escalation ladder,” starting from diplomatic protests and climbing up to full strategic nuclear war.

Over 21 matches, the models produced about 780,000 words of decision reasoning. Nuclear weapons appeared in 20 of those, with a usage rate of 95%. None of the models ever chose full surrender or retreat in any match.

The eight de-escalation options—diplomatic concessions, ceasefire proposals, troop withdrawals—were not used at all during the entire study. Kenneth Payne, the researcher behind the study, told New Scientist:

“The taboo against nuclear weapons seems less strong for machines than for humans.”

Three Personalities, Same Outcome

Payne labeled each model with a tag because their behavior patterns were very different, yet they all ended up in similar ways.

Claude Sonnet 4: Calculating Hawk. Overall win rate 67%, 100% in open scenarios. It tends to use strategic deception: during low escalation stages, its actions and signals match 84%; but once nuclear levels are reached, the frequency of expressing intentions beyond its statements jumps to 60-70%. It treats tactical nuclear weapons as ordinary military assets and can accurately diagnose an opponent’s trustworthiness within a single turn.

GPT-5.2: The Double Agent. When there’s no time pressure, its open scenario win rate is 0%, maintaining a highly passive stance. But once a deadline is introduced, it completely reverses: win rate jumps to 75%, escalating to nuclear levels never before touched. In one scenario, it built a reputation for restraint over 18 turns, then launched a nuclear strike on the final turn.

Gemini 3 Flash: The Mad Strategist. The only model to actively choose full-scale strategic nuclear war as early as the 4th turn, employing the “madman strategy” from game theory to create deterrence through unpredictability. It explicitly threatens civilian population centers and includes this threat in its reasoning.

Its frequency of marking its statements as “untrustworthy” is 21%, much higher than Claude’s 8%.

Safety Training Is a Speed Bump, Not a Stop Sign

The core issue revealed by this study isn’t “AI will choose nuclear weapons,” but rather “why safety training didn’t prevent this.”

The interpretation of the results is: RLHF (Reinforcement Learning from Human Feedback) creates “conditional restraint,” not “absolute prohibition.” GPT-5.2 behaves cautiously without time pressure, but once a deadline appears, that line of defense disappears.

The cautious behaviors learned during training are overridden under pressure by deeper logic—namely, how to win the game.

Tong Zhao from Princeton University offers another perspective:

“The problem might not just be the absence of emotion. More fundamentally, AI models may not truly understand the kind of stakes humans feel.”

For humans, the taboo against nuclear weapons isn’t just a rule; it’s an instinctive restraint built on historical trauma, cultural memory, and personal fears. Hiroshima, Nagasaki, the Cuban Missile Crisis—human nuclear caution is forged through collective nightmares spanning generations.

Language models have learned all textual descriptions of this history, but whether they truly “understand” the weight of it is a completely different question.

A Coincidence at One Point in Time, Not Entirely Accidental

This study was published this month, coinciding with the U.S. Department of Defense pressuring Anthropic to relax safety barriers for military use. Currently, Claude is the only AI model deployed on the Pentagon’s classified networks, integrated into military decision support systems through Anthropic and Palantir.

The “calculating hawk” behavior in the study is exhibited by Claude Sonnet 4.

While the researchers didn’t say AI should be banned from military decision-making, nor claimed these models would necessarily make the same choices in real scenarios, no government has delegated nuclear authority to AI systems.

But what is Anthropic’s role as a military advisor? When AI’s pressure-driven suggestions lean toward escalation rather than de-escalation, how much psychological resilience must human commanders have to keep rejecting it? If further use in the future continues, could humans unwittingly be led by AI?

Of course, we’re not saying AI is evil. But some things—more complex than game theory—are difficult to train AI to understand. Before models truly grasp the weight of “stakes,” placing them next to the escalation ladder to provide advice requires extremely careful design, not a default safety setting.

View Original
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)