AI Leaders Debate Model Differentiation and China's Embodied Intelligence Opportunity at Zhiyuan Conference

Industry leaders at the Beijing Zhiyuan Conference debated AI model homogenization concerns as top model evaluation performances increasingly converge and the gap between open-source and closed-source models is believed to be only 3-6 months. Bluerun Ventures Managing Partner Chen Weiguang, Zhiyuan Research Institute Director Wang Zhongyuan, Galaxy General Founder and CTO Wang He, and MiniMax CEO Li Dahai discussed long-term value sources in the large model era. The panel addressed whether AI models and embodied intelligence industries are moving toward homogenization and where lasting competitive advantages exist. Industry observers view talent as key in the US-China AI competition, with embodied intelligence representing China's opportunity to achieve breakthrough moments comparable to AlphaGo and ChatGPT.

Industry Leaders Reject Homogenization Concerns at Zhiyuan Conference

Wang Zhongyuan stated that while various large model leaderboards are dizzying and the rankings themselves are not entirely credible, model companies that dare to conduct live demonstrations and enter real-world scenarios demonstrate confidence and can find data closed loops in real scenarios. He stated that overall large model performance iteration is far from reaching a bottleneck, technical routes have not converged, and the future may present multiple patterns including "one super power with multiple strong players" or "multiple giants standing side by side." Wang characterized claims that the industry will move toward homogenization as premature.

Galaxy General Founder Wang He extended the discussion from large language models to embodied intelligence. He stated that large language models themselves still have many variables, with greater uncertainty in multimodal and video understanding capabilities. Wang characterized embodied intelligence as currently at "the stage from GPT-1 to GPT-2," with the industry just entering an acceleration period.

Wang He described embodied intelligence's competitive moat as a complete system encompassing source data supply (synthetic data, human data, robot data), data refinement capabilities, hardware iteration and software-hardware co-design, model throughput fusion capabilities, and final hardware delivery capabilities. He characterized this as a "hexagonal warrior" comprehensive system, stating no mature products of this type exist worldwide and the moat remains extremely deep.

MiniMax CEO Li Dahai cited Anthropic's commercial success as direct evidence against homogenization. He stated that large models cannot be merely "T-shaped talents" with only general horizontal capabilities, but must have vertical strengths. Li explained that Anthropic became a global phenomenon because it built coding capabilities to an unparalleled level on top of its general model foundation, supporting high valuation and impressive commercial performance.

Li stated that large models are internalizing into system evolution rather than isolated technical points. He stated that future model optimization must deeply coordinate with application scenarios, comparing it to engine design that must coordinate with the entire vehicle, where optimization directions differ completely for F1 racing cars versus grocery shopping cars. Li stated that technical universality and commercial universality must be separated, with good commercialization requiring extreme scenario-specific model optimization, allowing each company to establish its own moat by finding the right direction.

Galaxy General Reports Embodied Intelligence at GPT-1 to GPT-2 Stage

Wang He shared Galaxy General's practice with the WAM (World Action Model) paradigm. Before the WAM paradigm emerged, Galaxy General used 1 billion frames of simulation data to verify scaling possibilities for grasping skills. The company developed GRASP-VLA to achieve zero-shot grasping of arbitrary objects, with no models relying on real teleoperation data reaching equivalent performance levels to date.

Wang explained that the emergence of the WAM paradigm completely broke the data bottleneck for embodied intelligence. Traditional VLA models require data with action labels and can only rely on robot data. WAM focuses on Action as the core, performing visual-level action planning through future prediction without requiring action labels. This means robots can directly learn behavioral logic from human videos, with massive human video data becoming training material.

Wang stated that Galaxy General published the world's first WAM paper in March 2025, and in April NVIDIA Embodied Intelligence Lab Director Jim Fan stated that the endgame for robots is WAM. Wang characterized embodied intelligence pre-training as entering an explosive period with no limitations on data acquisition. He stated that over the next two years, embodied intelligence will fully usher in its GPT-3.5 moment, with the entry ticket being tens of millions of hours of high-quality data and billions in capital investment.

Multimodal AI and Embodied Intelligence Open New Scaling Pathways

Wang Zhongyuan disclosed that last year's industry discussions about Scaling Law failure stemmed from anxiety that "internet pre-training data has been exhausted." Over the past two years, post-training, reasoning optimization, and Agent recursive self-evolution have brought a new wave of capability improvements. Wang stated this represents not necessarily parameter increases in the models themselves, but the entire system becoming increasingly capable, with AI transforming from a chat tool to an execution tool.

As a research institute, Zhiyuan is exploring the next intelligence growth curve. Over the past two years, the institute verified the scaling paradigm in the multimodal field, with the Wujie Emu3 series using less than 1% of multimodal data and tens of billions of parameters already showing clear performance improvements. The institute has now begun advancing toward world foundation models for the physical world, exploring scaling paths for world models.

Li Dahai proposed MiniMax's "knowledge density law": overall large model intelligence = knowledge density × parameter count. He disclosed that when deploying edge models for automotive companies last year, they could only achieve 1B parameters, this year upgraded to 4B, and next year will likely reach tens of billions. As quantization technology improves and knowledge density increases, stronger models after quantization occupy the same resources as before, with edge model scale expansion just beginning.

Li stated that many phased conclusions in the industry have very short shelf lives, with development constantly overturning old perceptions. He stated that not only do edge models have enormous room for growth, but large language models' long context processing and low-power optimization still have scaling potential far from fully explored, with the industry far from reaching a convergence stage.

Panel Identifies China's Supply Chain and Talent Advantages

Wang Zhongyuan stated that AI technology development follows the same path as autonomous driving, necessarily experiencing a process from worry and fear to adaptation and use, then to establishing complete governance systems and responsibility allocation mechanisms. When technology can bring 3-5 times productivity improvement, its popularization cannot be blocked, and humanity, having experienced multiple rounds of technological waves, will find corresponding governance solutions.

Li Dahai stated that human society essentially developed through "learning from mistakes" — airplane safety rules and road speed limits each have painful lessons behind them. AI technology will improve the efficiency of discovering vulnerabilities and fixing problems, greatly reducing this cost, with the industry highly emphasizing safety baselines from the startup stage and companies proactively assuming social responsibility. Li stated that the pattern of learning from mistakes may be difficult to completely avoid, with safety risks often appearing from unexpected dimensions, making rule improvement through lessons a reality that must be faced.

Regarding China's differentiation advantages in AI, Wang Zhongyuan stated that China's supply chain, manufacturing advantages, and vast domestic market are sufficient to incubate and catalyze new technology implementation, with embodied intelligence and world models likely becoming areas where China achieves differentiated leadership.

Wang He stated firmly that embodied intelligence is China's opportunity. He expressed conviction that embodied intelligence's "AlphaGo moment" and "ChatGPT moment" will both be realized in China, stating that if zero to one is completed in China, one to one hundred will definitely mature in China.

Li Dahai added the most core underlying factor: China possesses the largest number of the world's smartest young AI talents, which is the most fundamental advantage. Combined with supply chain, ecosystem, and scenario advantages, China will definitely make significant progress in the AI field.

FAQ

What stage did Galaxy General say embodied intelligence has reached?

Galaxy General Founder and CTO Wang He stated at the Beijing Zhiyuan Conference that embodied intelligence is currently at "the stage from GPT-1 to GPT-2," with the industry just entering an acceleration period. Wang stated that over the next two years, embodied intelligence will fully usher in its GPT-3.5 moment, with the entry ticket being tens of millions of hours of high-quality data and billions in capital investment.

How did panel participants respond to AI model homogenization concerns?

Zhiyuan Research Institute Director Wang Zhongyuan stated that overall large model performance iteration is far from reaching a bottleneck and technical routes have not converged, characterizing homogenization claims as premature. MiniMax CEO Li Dahai cited Anthropic's success in coding capabilities as evidence that companies can build differentiation through vertical strengths. Galaxy General's Wang He described embodied intelligence's competitive moat as a complete system encompassing data supply, hardware iteration, and model capabilities, stating no mature products of this type exist worldwide.

What advantages did the panel identify for China's AI development?

Panel participants identified multiple China advantages. Wang Zhongyuan cited China's supply chain, manufacturing advantages, and vast domestic market as sufficient to catalyze new technology implementation. Li Dahai stated that China possesses the largest number of the world's smartest young AI talents as the most fundamental advantage. Wang He expressed conviction that embodied intelligence's breakthrough moments comparable to AlphaGo and ChatGPT will be realized in China, stating that if zero to one is completed in China, one to one hundred will definitely mature in China.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments