Google DeepMind has released a brand-new robotics foundation model, Gemini Robotics ER 1.6, where “ER” stands for Embodied Reasoning (embodied reasoning). This model achieves the current best performance (SOTA) in visual and spatial reasoning, and is already available through the Gemini API. Logan Kilpatrick, the Head of Developer Relations at Google AI, announced this on social media. (Source)
What is Embodied Reasoning?
Embodied Reasoning refers to an AI model’s ability to understand and reason about the physical world. Unlike traditional language models, embodied reasoning models must process the positions, shapes, materials, and physical interaction relationships of objects in three-dimensional space. Gemini Robotics ER 1.6 is specifically optimized for these kinds of tasks, enabling robots to understand their surroundings more accurately and make appropriate action decisions.
Core capabilities
The main advantages of Gemini Robotics ER 1.6 focus on two areas:
Capability Description Visual reasoning Able to identify objects from images and videos, understand the structure of the scene, and make decisions accordingly Spatial reasoning Understand the relative positions, distances, and directions of objects in three-dimensional space, supporting complex operation planning
The combination of these two capabilities allows robots to handle more complex real-world tasks. For example, in a warehouse environment, robots need to identify objects of different shapes at the same time and calculate the best grasp angle and placement position—this is exactly the kind of scenario Gemini Robotics ER 1.6 excels at.
Using the Gemini API
Unlike many past robot models that only existed at the paper stage, Gemini Robotics ER 1.6 is already accessible via the Gemini API. This means developers and hardware vendors can integrate this model directly into their own robotic systems, without having to train the model from scratch.
Opening up the API also lowers the development barrier for robot AI. In the past, building a robot system with visual and spatial reasoning capabilities required a large amount of data collection and model training work. Now, developers can focus on developing hardware design and application scenarios, leaving the underlying reasoning capabilities to Gemini Robotics ER 1.6.
Google’s robotics AI roadmap
Gemini Robotics ER 1.6 is the latest achievement by Google DeepMind in the field of robotics. From the early RT-2 to the present Gemini Robotics series, Google has continued extending the capabilities of large language models into interactions with the physical world. The ER 1.6 version further improves reasoning accuracy on top of its predecessors, performing especially well in scenarios that require precise operations.
As the robotics industry enters a new growth cycle, foundation models with strong visual and spatial reasoning capabilities will become key infrastructure. To learn more about the development of the Gemini ecosystem, you can refer to the complete Gemini guide.
This article Google launches Gemini Robotics ER 1.6: SOTA robot model, strong in visual and spatial reasoning was first published on Chain News ABMedia.
Related Articles
Meta Stock Rises 1.73% as Company Plans 8,000-Job Layoff Starting May 20
Google’s annual report says Gemini achieves millisecond interception, blocking 99% of scam ads
Ethereum Co-founder Lubin: AI Will Be Critical Turning Point for Crypto, But Tech Giant Monopoly Poses Systemic Risk
Elon Musk Pushes 'Universal High Income' Checks as Ultimate Solution for AI Unemployment
DeepSeek Reportedly Launches First External Fundraising Round, Targets $10B+ Valuation and $300M+
ChatGPT ads move into Australia and New Zealand: Free and Go users first, paid plans stay ad-free