2026-01-14 07:54:11

Claude Skills only became popular for a short while, and yesterday DeepSeek released a new paper, using Engram to tell the market: maybe your direction is wrong?? AI LLMs are truly a daily spectacle of divine battles!😱

A simple comparison reveals the difference: Anthropic equipped the model with a super secretary to help you organize 200 documents and remember all conversations; DeepSeek is more aggressive, directly performing brain surgery on the model to grow a "memory organ," allowing it to respond in O(1) seconds like looking up a dictionary, without layers of neural network activation.

This problem should have been solved long ago.

Since the Transformer architecture, large models inherently process knowledge like a rote memorizer, having to go through the entire 175B parameter brain every time you ask "Who is Princess Diana," which consumes enormous computational resources.

It's like every time you want to look up a word, you have to memorize the entire Oxford Dictionary from A to Z to get the answer—how absurd! Even with the current popular MoE architecture, recalling a cold fact requires mobilizing many expensive computational experts.

1) Core breakthrough of Engram: enabling the model to grow a "memory organ"

Engram does a simple thing: it separates static factual knowledge from "parameter memory" and throws it into a scalable hash table, using N-gram segmentation + multi-head hash mapping to achieve O(1) constant-time lookup.

In plain language, it's managing a context system, still letting AI hold a manual and look up information when needed, but Engram aims to create a new organ in the brain dedicated to instantly "recall" some fixed knowledge, eliminating the need for reasoning.

How powerful is this? A 27B parameter model improves by 3.4% on knowledge tasks (MMLU), and long-text retrieval jumps from 84% to 97%. The key is, these memory parameters can be offloaded to cheap DDR memory or even hard drives, making inference costs almost zero.

2) Is this a revolution against RAG and GPU arms race?

If Engram truly works, the first to be impacted won't be OpenAI, but RAG (Retrieval-Augmented Generation) methods and NVIDIA's GPU business, especially public knowledge base RAG.

Because, fundamentally, RAG makes the model "look up data" from external databases, but it’s slow, poorly integrated, and requires maintaining vector databases. Engram directly embeds the memory module into the model architecture, enabling fast and accurate lookups, with context gating to filter hash conflicts.

Moreover, the paper mentions the discovery of the "U-shaped scaling law," which is exciting. If the model allocates 20-25% of its parameters to Engram as "memory hard drives," and the remaining 75-80% for traditional neural network "reasoning brain," performance can logarithmically improve as memory scale expands tenfold.

This completely breaks the belief that "bigger parameters = smarter," transforming the arms race of "endless H100 stacking" into an efficiency game of "moderate compute + massive cheap memory."

That's all.

It’s uncertain whether DeepSeek V4 will be released around the Spring Festival, but it might showcase a combination of Engram and the previous mHC punch.

This paradigm shift from "compute power is king" to "compute + memory" dual-driven revolution is likely to trigger another wave of fierce competition, depending on how giants like OpenAI, Anthropic, and others leverage their compute resource advantages.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.