Overview
Gemini Robotics 1.5 is a vision-language-action model that turns natural instructions and sensor input into reliable robot behavior. It plans, grounds decisions in live perception, and executes skills or code on ROS/industrial stacks with long-context memory, fast replanning, and built-in safety checks.
Description
Gemini Robotics 1.5 connects perception, reasoning, and control so robots can follow plain-language instructions and adapt in the real world. The model ingests images, video, depth, and text, builds a scene understanding with affordances and goals, then produces a step-by-step plan that calls skills or emits concise control code for common runtimes like ROS/ROS 2 and industrial APIs. It keeps a long working memory of prior steps and observations, which lets it recover from failures, ask for clarifications when uncertainty is high, and replan in closed loop as the scene changes. On manipulation and mobile tasks it fuses multi-camera views, tracks objects through occlusion, and respects spatial constraints, while task libraries provide reusable primitives for grasping, kitting, inspection, and navigation that the model can compose into larger routines. Deployment targets range from simulators to real hardware, with low-latency streaming for control loops and options to run the policy on GPUs at the edge or behind a service for fleet coordination. Safety is enforced through constraint-aware planning, confidence thresholds, and guardrailed tool calls, so the system slows, stops, or escalates to a human when conditions fall outside its envelope. The result is a practical robotics foundation that moves from instruction to grounded action—robust enough for labs and pilots, and efficient enough to scale into production workflows.
About DeepMind
DeepMind is a technology company that specializes in artificial intelligence and machine learning.
Industry:
Research Services
Company Size:
501-1000
Location:
London, GB
View Company Profile