During the CES conference, NVIDIA's new Rubin architecture has become the focus of recent market discussions. It is considered a major breakthrough beyond the MOE architecture and is tailored for the Agentic AI era. I took a closer look and truly felt the essence of Jensen Huang's "self-revolution":
1) In the past, NVIDIA relied on its GPU hardware advantages, riding the wave of AI giants aggressively purchasing computing power to train large models. The logic was simple: whoever has more graphics cards can train the best models.
But now, the AI war has shifted from a "computing power" battlefield to "inference," especially with the arrival of the Agentic era, where AI needs to handle high-frequency, multi-step, ultra-long context reasoning.
At this point, model parameters often reach trillions, data throughput is enormous. No matter how fast GPUs are, if data transfer within memory isn't fast enough, GPUs will be idle, creating the "memory wall." In other words, simply increasing the number of graphics cards no longer solves the problem; high VRAM and bandwidth are also needed to support it. Rubin aims to solve this issue.
2) Therefore, Rubin's first release of HBM4, supporting the fourth-generation high-bandwidth memory, can achieve bandwidth up to 22TB/s. More importantly, combined with NVLink 6 technology (rack bandwidth of 260TB/s), it logically turns 72 cards into "one giant chip."
What does this mean? Previously, when you bought a graphics card, you bought an independent component, and data transfer between cards was like a courier passing through several transfer stations. Now, Rubin's ultra-high-density interconnects make data flow between different GPUs almost feel like there's no physical distance—72 workers no longer work separately but share one big brain.
I believe this is Rubin's real killer feature: not just stacking hardware parameters but reconstructing the entire system's data flow.
3) If MOE (Mixture of Experts) architecture is a dimensionality reduction strike against NVIDIA's "brute-force GPU stacking" business model, then Rubin appears to be Huang's strategic counterattack. Instead of competing over who saves more cards, it directly redefines the cost of AI usage. Of course, this bold move also means NVIDIA is saying goodbye to the old brute-force GPU stacking model.
Huang is calculating a different equation: for the Agentic era to truly penetrate thousands of industries, it must overcome the token cost barrier—an unstoppable trend for NVIDIA.
In Huang's view, rather than waiting for giants like Google and Meta to develop their own chips and eat away at the market, or for DeepSeek to disrupt the supply side with new models, it's better to take the initiative and break the deadlock.
4) The question is, how will NVIDIA, after its self-revolution, position itself? The path is clear: from "selling graphics cards" to "selling systems," from serving a few large companies to truly popularizing AI.
Previously, buying an H100 meant NVIDIA earned from that single graphics card. In the future, Rubin will tell you: you need to buy the entire NVL72 rack—72 GPUs, NVLink switches, full liquid cooling systems, cabinets, and even the accompanying software stack—all bundled and sold to you.
Huang's plan is very clear: although the hardware cost appears higher when packaged, it offers extreme inference efficiency, reducing the unit cost of AI deployment for the purchaser, and naturally maintaining market share.
But, but, but—this approach also raises the entry barrier for small and medium players. Only large firms and cloud service providers can afford it, further intensifying the monopoly on computing power. Given the current competitive landscape, this is a high-stakes gamble because if HBM4 encounters production issues, it could be replaced by alternatives launched by AMD, Google TPU, and others during the window. That would make NVIDIA's dream of selling systems much harder to realize.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
During the CES conference, NVIDIA's new Rubin architecture has become the focus of recent market discussions. It is considered a major breakthrough beyond the MOE architecture and is tailored for the Agentic AI era. I took a closer look and truly felt the essence of Jensen Huang's "self-revolution":
1) In the past, NVIDIA relied on its GPU hardware advantages, riding the wave of AI giants aggressively purchasing computing power to train large models. The logic was simple: whoever has more graphics cards can train the best models.
But now, the AI war has shifted from a "computing power" battlefield to "inference," especially with the arrival of the Agentic era, where AI needs to handle high-frequency, multi-step, ultra-long context reasoning.
At this point, model parameters often reach trillions, data throughput is enormous. No matter how fast GPUs are, if data transfer within memory isn't fast enough, GPUs will be idle, creating the "memory wall." In other words, simply increasing the number of graphics cards no longer solves the problem; high VRAM and bandwidth are also needed to support it. Rubin aims to solve this issue.
2) Therefore, Rubin's first release of HBM4, supporting the fourth-generation high-bandwidth memory, can achieve bandwidth up to 22TB/s. More importantly, combined with NVLink 6 technology (rack bandwidth of 260TB/s), it logically turns 72 cards into "one giant chip."
What does this mean? Previously, when you bought a graphics card, you bought an independent component, and data transfer between cards was like a courier passing through several transfer stations. Now, Rubin's ultra-high-density interconnects make data flow between different GPUs almost feel like there's no physical distance—72 workers no longer work separately but share one big brain.
I believe this is Rubin's real killer feature: not just stacking hardware parameters but reconstructing the entire system's data flow.
3) If MOE (Mixture of Experts) architecture is a dimensionality reduction strike against NVIDIA's "brute-force GPU stacking" business model, then Rubin appears to be Huang's strategic counterattack. Instead of competing over who saves more cards, it directly redefines the cost of AI usage. Of course, this bold move also means NVIDIA is saying goodbye to the old brute-force GPU stacking model.
Huang is calculating a different equation: for the Agentic era to truly penetrate thousands of industries, it must overcome the token cost barrier—an unstoppable trend for NVIDIA.
In Huang's view, rather than waiting for giants like Google and Meta to develop their own chips and eat away at the market, or for DeepSeek to disrupt the supply side with new models, it's better to take the initiative and break the deadlock.
4) The question is, how will NVIDIA, after its self-revolution, position itself? The path is clear: from "selling graphics cards" to "selling systems," from serving a few large companies to truly popularizing AI.
Previously, buying an H100 meant NVIDIA earned from that single graphics card. In the future, Rubin will tell you: you need to buy the entire NVL72 rack—72 GPUs, NVLink switches, full liquid cooling systems, cabinets, and even the accompanying software stack—all bundled and sold to you.
Huang's plan is very clear: although the hardware cost appears higher when packaged, it offers extreme inference efficiency, reducing the unit cost of AI deployment for the purchaser, and naturally maintaining market share.
But, but, but—this approach also raises the entry barrier for small and medium players. Only large firms and cloud service providers can afford it, further intensifying the monopoly on computing power. Given the current competitive landscape, this is a high-stakes gamble because if HBM4 encounters production issues, it could be replaced by alternatives launched by AMD, Google TPU, and others during the window. That would make NVIDIA's dream of selling systems much harder to realize.