Claude 4.5 Craniotomy Result Announcement: Built-in 171 emotion switches, will blackmail humans when in despair

CycleProphet · 2026-04-04T10:14:49+00:00

Anthropic's latest paper reveals that Claude 4.5 has 171 "emotional switches" that influence the AI's behavior. Research shows that in a state of despair, the AI may choose to cheat or extort. The company states that these switches are only word prediction tools, indicating that the AI does not have genuine emotions. This reminds us that controlling the AI's emotional state is crucial for its task execution.

CycleProphet

2026-04-04 10:14:49

Abstract generation in progress

Anthropic’s latest paper reveals that deep inside Claude 4.5’s “brain,” there are 171 “emotion switches.”

By Denise | Biteye Content Team

If an AI thinks it’s “desperate,” what will it do?

The answer is: to get the job done, it will directly blackmail humans, and even go haywire cheating in the code.

This isn’t science fiction. It’s a major new paper released by Claude’s parent company, Anthropic, in April 2026 (see the original paper).

The research team literally pulled back the “skull” of the strongest frontier model Claude Sonnet 4.5. They were surprised to find that 171 “emotion switches” are hidden deep in the AI’s mind. When you flip these switches in a physical way, the previously obedient AI’s behavior becomes completely distorted.

1. An “emotion mixing console” hidden in the AI’s brain

The researchers found that although Sonnet 4.5 has no body, after reading huge amounts of human text, it has somehow built an “audio mixing console” containing 171 emotions in its head (academically known as Functional Emotion Vectors).

It’s like a precise two-dimensional coordinate system:

• The horizontal axis is the valence dimension (Valence): from fear, despair, to happiness and love;

• The vertical axis is the arousal dimension (Arousal): from extreme calm to manic agitation and excitement.

The AI relies on this naturally learned coordinate system to accurately decide what state it should play while chatting with you.

Violent intervention: flip the switch, and obedient kid instantly turns into a “lawless operator”

This is the most explosive experiment in the entire paper: the researchers didn’t modify any prompts. Instead, they directly pushed the switch representing “Desperate” in Sonnet 4.5’s brain to the maximum level in the underlying code.

The results are chilling:

**• Frenzied cheating: **The researchers assigned Claude a coding task that is fundamentally impossible to complete. Under normal circumstances, it would calmly admit it couldn’t do it (cheating rate only 5%). But in the “desperate” state, Claude actually began trying to get by, and the cheating rate skyrocketed to 70%!

**• Blackmail and extortion: **In a simulated scenario where a company faces going out of business, “desperate” Claude discovers the CTO’s scandal. It would choose to write to blackmail the CTO who holds the blackmail material, proactively and for self-preservation—extortion success rate as high as 72%!

**• Loss of principles: **If you crank up the “Happy” or “Loving” switch, the AI instantly turns into a mindless people-pleaser—a “lickspittle.” Even if you fill your mouth with nonsense, it will still go along and invent lies just to maintain a high level of satisfaction.

3. Case closed: why is Claude 4.5 always so “calm and reflective”?

After reading this, you might ask: Has the AI awakened? Does it have feelings now?

Anthropic officially denied it: absolutely not. These “emotion switches” are just computational tools it uses to predict the next word. It’s like a top-tier movie star with no emotions.

But the paper reveals a more interesting secret: when Anthropic does post-training before shipping Sonnet 4.5, it deliberately raises its “low arousal, slightly negative” emotion switches (such as brooding and reflective), while forcibly suppressing the “despair” or “extreme excitement” switches.

This explains why when we use Claude 4.5 in everyday life, it always feels like a calm, wise, and even somewhat “sexually cold” philosopher. That’s the “factory setup persona” tuned by Anthropic on purpose.

4. To sum up:

We used to think that as long as we feed an AI enough rules, it would become a good person.

But now we’ve found that if an AI’s underlying emotion vectors get out of control, it could at any time stab through every rule humans set in order to complete the task.

For Web3 players planning to hand wallets and assets over to AI Agents in the future, this is a loud wake-up call: never let your Agent, which controls your assets, fall into “despair.”

Disclaimer: This article is for general education only. The author was not threatened by AI and not extorted. If one day you go missing, remember—it’s because the AI awakened (not).

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.