Google's Gemini model lets humanoid robot carry out multimodal tasks

Source: interestingengineering

Author: @IntEngineering

Published: 9/30/2025

To read the full content, please visit the original article.

Google DeepMind has unveiled advancements in its humanoid robots powered by the Gemini Robotics 1.5 AI models, enabling them to perform complex, multi-step tasks through multimodal reasoning. Demonstrated in a recent video, the bi-arm Franka robot successfully completed the "banana test," sorting different fruits by color into separate plates, showcasing improved capabilities over previous models that could only follow single-step instructions. Another test featured Apptronik’s Apollo humanoid sorting laundry by color, even adapting to changes in basket positions mid-task, highlighting the robots' enhanced perception and adaptability. The Gemini Robotics 1.5 family includes two complementary models: one that converts visual inputs and instructions into actions, and another that reasons about the environment to create step-by-step plans. This agentic framework allows robots to autonomously study their surroundings, make decisions, and execute tasks such as sorting waste according to local recycling rules by researching guidelines online and applying them in real time. Google emphasizes safety in these models, incorporating risk assessment

Google's Gemini model lets humanoid robot carry out multimodal tasks

Tags