China's AI Computing Power Counterattack

Cost itself is progress.

Written by: Sleepy.txt

Eight years ago, ZTE’s heart stopped.

On April 16, 2018, a ban issued by the U.S. Department of Commerce’s Bureau of Industry and Security caused ZTE, a global fourth-largest telecom equipment manufacturer with 80,000 employees and annual revenue exceeding 100 billion yuan, to come to a halt overnight. The ban was simple: for the next seven years, no U.S. company could sell parts, products, software, or technology to ZTE.

Without Qualcomm chips, base stations ceased production. Without Google’s Android license, phones had no usable system. Twenty-three days later, ZTE announced that its main business operations had become impossible.

But ZTE ultimately survived, at a cost of 1.4 billion USD.

A $1 billion fine was paid in one lump sum; $400 million in escrow in a U.S. bank. Additionally, all executives were replaced, and a U.S. compliance supervision team was brought in. Throughout 2018, ZTE suffered a net loss of 7 billion yuan, with revenue plummeting 21.4% year-over-year.

ZTE’s then-chairman Yin Yimin wrote in an internal letter: “We are in a complex industry that is highly dependent on the global supply chain.” At the time, this was a reflection and a sense of helplessness.

Eight years later, on February 26, 2026, China’s AI unicorn DeepSeek announced that its upcoming V4 multimodal large model would prioritize deep cooperation with domestic chip manufacturers, achieving the first full-process non-NVIDIA solution from pretraining to fine-tuning.

In other words: We no longer need NVIDIA.

Once the news broke, the market’s first reaction was skepticism. NVIDIA holds over 90% of the global AI training chip market. Abandoning it—does that make business sense?

But behind DeepSeek’s choice lies a bigger question than business logic: What kind of computational independence does Chinese AI truly need?

What is really being held back?

Many believe that chip bans are about hardware. But what truly suffocates Chinese AI companies is something called CUDA.

CUDA, short for Compute Unified Device Architecture, is a parallel computing platform and programming model launched by NVIDIA in 2006. It allows developers to directly harness NVIDIA GPUs’ computing power to accelerate complex calculations.

Before the rise of AI, CUDA was a tool for a few tech enthusiasts. But with the wave of deep learning, CUDA became the foundation of the entire AI industry.

Training large AI models is essentially massive matrix computations—precisely what GPUs excel at.

Thanks to a layout that began over a decade earlier, NVIDIA used CUDA to build a complete toolchain from hardware to application for global AI developers. Today, all major AI frameworks—from Google’s TensorFlow to Meta’s PyTorch—are deeply integrated with CUDA.

A PhD student specializing in AI starts learning, coding, and experimenting in a CUDA environment from day one. Every line of code they write reinforces NVIDIA’s moat.

By 2025, the CUDA ecosystem had over 4.5 million developers, supporting more than 3,000 GPU-accelerated applications. Over 40,000 companies worldwide use CUDA. This means more than 90% of global AI developers are tied into NVIDIA’s ecosystem.

CUDA’s power lies in its flywheel effect: the more developers use it, the more tools, libraries, and code are created, making the ecosystem more vibrant; a more vibrant ecosystem attracts even more developers. Once spinning, this wheel is almost impossible to stop.

As a result, NVIDIA sells you the most expensive shovels and defines the only mining posture. Want a different shovel? Fine. But you’ll need to rewrite all the experience, tools, and code accumulated over the past decades by hundreds of thousands of the world’s smartest minds under this paradigm.

Who bears this cost?

So when, on October 7, 2022, BIS’s first round of controls restricted exports of NVIDIA’s A100 and H100 chips to China, Chinese AI companies felt their own version of ZTE’s suffocation for the first time. NVIDIA quickly released “China-specific” A800 and H800 chips, reducing interconnect bandwidth to keep supply going.

But just a year later, on October 17, 2023, a second round of tighter controls banned A800 and H800, and 13 Chinese companies were added to the entity list. NVIDIA had to launch further cut-down versions like H20. By December 2024, the last round of controls during Biden’s term was implemented, with even H20 exports strictly limited.

Three rounds of controls, layer upon layer.

But this time, the story’s trajectory is very different from ZTE’s.

An Asymmetric Breakthrough

Under the bans, everyone thought China’s big-model AI dream would end here.

They were wrong. Faced with blockade, Chinese companies did not choose direct confrontation but instead launched a breakout. The first battlefield of this breakout was not chips, but algorithms.

From late 2024 to 2025, Chinese AI companies collectively shifted toward a new technical approach: mixture-of-experts models.

Simply put, this involves splitting a huge model into many small experts, activating only the most relevant ones for a given task, rather than running the entire model.

DeepSeek’s V3 exemplifies this approach. It has 671 billion parameters, but only activates 37 billion (about 5.5%) during inference. Training used 2,048 NVIDIA H800 GPUs over 58 days, costing about $5.576 million. In comparison, estimates for GPT-4 training costs are around $78 million—a difference of an order of magnitude.

Extreme algorithmic optimization directly impacts cost. DeepSeek’s API charges between $0.028 and $0.28 per million tokens for input, and $0.42 for output. GPT-4 costs $5 for input and $15 for output. Claude Opus is even more expensive, at $15 input and $75 output. In dollar terms, DeepSeek is 25 to 75 times cheaper than Claude.

This price gap has a huge impact on the global developer market. By February 2026, on the world’s largest AI model API platform, OpenRouter, Chinese AI models’ weekly calls surged 127% in three weeks, surpassing the U.S. for the first time. A year earlier, Chinese models accounted for less than 2% of OpenRouter’s market share. One year later, it grew by 421%, approaching 60%.

Behind these numbers lies a subtle but critical structural change. Starting in late 2025, mainstream AI application scenarios shifted from chat to agents. In agent scenarios, token consumption per task is 10 to 100 times higher than simple chat. When token consumption grows exponentially, price becomes a decisive factor. Chinese models’ extreme cost-effectiveness hits this window perfectly.

But lowering inference costs does not solve the fundamental training problem. A large model that cannot be continuously trained and iterated on the latest data will quickly degrade in capability. Training remains the unavoidably black hole of compute power.

So, where does the “shovel” for training come from?

The Rise of the Backup

In Xinghua, Jiangsu—an unassuming small city known for stainless steel and health foods—nothing related to AI had happened before. But in 2025, a 148-meter-long domestic computing server production line was built and put into operation there, taking only 180 days from signing to production.

The core of this line is two fully domestic chips: the Loongson 3C6000 processor and the Taichuyuan Qi T100 AI accelerator card. Loongson 3C6000 is fully self-developed in instruction set and microarchitecture. Taichuyuan Qi, derived from the National Supercomputing Center in Wuxi and Tsinghua University teams, adopts a heterogeneous many-core architecture.

When operating at full capacity, this line can produce a server every five minutes. The total investment is 1.1 billion yuan, with an expected annual output of 100,000 units.

More importantly, these domestically produced chips form a cluster that has already begun undertaking real large-model training tasks.

In January 2026, Zhipu AI and Huawei jointly released GLM-Image, the first state-of-the-art image generation model trained entirely with domestically produced chips. In February, China Telecom’s trillion-yuan “Xingchen” large model completed full training on a domestic cluster of 10,000 chips in Shanghai’s Lingang.

These cases demonstrate one key point: domestic chips have moved from “usable for inference” to “capable of training.” This is a qualitative leap. Inference only requires running a trained model; training demands handling massive data, complex gradient calculations, and parameter updates—requiring far greater compute power, interconnect bandwidth, and software ecosystems.

The core force behind these tasks is Huawei’s Ascend series chips. By the end of 2025, the Ascend ecosystem had over 4 million developers, more than 3,000 partners, and 43 mainstream large models pre-trained on Ascend. Over 200 open-source models had been adapted. At the MWC in March 2026, Huawei launched its new generation of computing base, SuperPoD, targeting overseas markets.

Ascend 910B’s FP16 performance already rivals NVIDIA’s A100. Although gaps remain, it has shifted from “not usable” to “usable,” and is now heading toward “good enough.” Ecosystem development cannot wait for perfect chips; it must start at a sufficient stage, using real business needs to drive iterative improvements in chips and software. ByteDance, Tencent, Baidu aim to double their deployment of domestic computing servers by 2026 compared to the previous year. According to MIIT, China’s AI computing scale has reached 1590 EFLOPS. 2026 is becoming the year of large-scale domestic compute deployment.

The US Power Shortage and China’s Outbound

In early 2026, Virginia—home to a large portion of global data center traffic—halted approval of new data center projects. Georgia followed suit, with approvals suspended until 2027. Illinois and Michigan also introduced restrictions.

According to IEA data, in 2024, U.S. data centers consumed 183 TWh of electricity—about 4% of the national total. By 2030, this is projected to double to 426 TWh, possibly exceeding 12%. Arm’s CEO predicts that by 2030, AI data centers will consume 20% to 25% of U.S. electricity.

The U.S. power grid is already strained. The PJM grid covering 13 eastern states faces a 6 GW capacity shortfall. By 2033, the U.S. as a whole will face a 175 GW capacity gap—enough to power 130 million households. Wholesale power costs in data center clusters have increased by 267% over five years.

The limit of compute power is energy. And in this dimension, the gap between China and the U.S. is even larger than chips—just in the opposite direction.

China’s annual power generation is 10.4 trillion kWh, while the U.S. produces 4.2 trillion kWh—about 2.5 times more than the U.S. China’s residential electricity accounts for only 15% of total consumption, compared to 36% in the U.S. This means China has far more industrial electricity available for compute infrastructure.

In electricity prices, U.S. AI company clusters pay about $0.12–$0.15 per kWh, while industrial electricity in western China costs around $0.03 per kWh—only a quarter to a fifth of U.S. prices.

China’s incremental power capacity has already reached seven times that of the U.S.

While the U.S. struggles with power shortages, Chinese AI is quietly going global. But this time, it’s not products or factories—it’s tokens.

Tokens, the smallest units of information processed by AI models, are becoming a new digital commodity. They are produced in China’s compute factories and transported via submarine cables worldwide.

DeepSeek’s user distribution illustrates this well: 30.7% in China, 13.6% in India, 6.9% in Indonesia, 4.3% in the U.S., 3.2% in France. Supporting 37 languages, it is popular in emerging markets like Brazil. Over 26,000 enterprises have accounts, with 3,200 deploying enterprise versions.

By 2025, 58% of new AI startups incorporated DeepSeek into their tech stacks. In China, DeepSeek holds 89% of the market share. In sanctioned countries, market share ranges from 40% to 60%.

This scene resembles a war over industry autonomy from forty years ago.

In 1986, under intense U.S. pressure, Japan signed the U.S.-Japan Semiconductor Agreement. The core terms included: Japan must open its semiconductor market, with U.S. chips holding over 20% share; Japanese semiconductors could not be exported below cost; and $300 million worth of Japanese chips faced a 100% punitive tariff. The U.S. also blocked Fujitsu’s acquisition of Fairchild Semiconductor.

At that time, Japan’s semiconductor industry was at its peak. By 1988, Japan controlled 51% of the global semiconductor market, compared to 36.8% for the U.S. Japan’s top six semiconductor firms dominated: NEC second, Toshiba third, Hitachi fifth, Fujitsu seventh, Mitsubishi eighth, Panasonic ninth. In 1985, Intel lost $173 million in the U.S.-Japan semiconductor war, nearly going bankrupt.

But after the agreement, everything changed.

The U.S. launched comprehensive suppression through investigations like the 301, while supporting South Korea’s Samsung and Hynix with lower prices to attack Japan’s market share. Japan’s DRAM share plummeted from 80% to 10%. By 2017, Japan’s IC market share was just 7%. Once dominant giants were split, acquired, or quietly exited amid endless losses.

The tragedy of Japan’s semiconductors was that they were content to be the best producers in a global division of labor dominated by external forces, never building their own independent ecosystem. When the tide receded, they found they had nothing but manufacturing.

Today’s Chinese AI industry faces a similar but fundamentally different crossroads.

The similarity: external pressure is immense. Three rounds of chip controls, layered on top of each other; the CUDA ecosystem barrier remains high.

The difference: this time, we are choosing a harder path. From extreme algorithmic optimization, to domestic chips crossing from inference to training, to the 4 million developers in the Ascend ecosystem, to token exports penetrating global markets. Every step is building an industry ecosystem that Japan never had.

Epilogue

On February 27, 2026, three performance reports from domestic AI chip companies were released on the same day.

Cambricon’s revenue surged 453%, achieving full-year profit for the first time. Moore Threads grew 243% but still posted a net loss of 1 billion yuan. Muoxi increased 121%, with a net loss close to 800 million yuan.

Half is fire, half is sea.

The fire: the market’s extreme hunger. The 95% gap left by Huang Renxun is being filled inch by inch by these domestic companies’ revenue figures. Regardless of performance or ecosystem, the market needs a second choice beyond NVIDIA. This is a rare structural opportunity torn open by geopolitical tensions.

The sea: the enormous cost of ecosystem building. Every loss is paid in real money—R&D investments, software subsidies, engineers on-site solving compilation issues. These losses are not due to mismanagement but are the war tax for building an independent ecosystem.

These three financial reports more honestly record the true face of this compute war than any industry analysis. It’s not a victory march but a fierce, bloody battlefield—fighting while bleeding.

But the nature of the war has already changed. Eight years ago, we discussed whether we could survive. Today, we ask how much it costs to survive.

Cost itself is progress.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)