Overview
Cosmos Reason is NVIDIA’s open, customizable 7B-parameter reasoning VLM for physical AI and robotics. It learns space/time/physics “common sense,” plans actions from video/images, and was trained with SFT + RL. Available under the NVIDIA Open Model License on GitHub/Hugging Face and as an NVIDIA NIM.
Description
NVIDIA Cosmos Reason is a vision-language model built to power robots and vision AI agents with physical common sense and long chain-of-thought planning. It reasons over multimodal inputs (video, images, text), understands fundamentals like space, time, and causal physics, and outputs step-by-step decisions for embodied tasks—spanning robot planning, autonomous-driving perception, data curation/annotation, and video-analytics agents. First shown in the 2025 GTC wave of “Cosmos world models,” it’s released as an open, customizable 7B model under the NVIDIA Open Model License, with deployable endpoints via NVIDIA NIM and weights on Hugging Face/GitHub.
The Cosmos-Reason1 paper details a four-stage pipeline—vision pretraining, general SFT, Physical-AI SFT, and Physical-AI RL—and reports embodied-reasoning gains; research also describes larger 8B and 56B variants alongside the public 7B release. NVIDIA further claims state-of-the-art results on physical-reasoning leaderboards and a 65.7 average across key robotics/AV benchmarks.
The Cosmos-Reason1 paper details a four-stage pipeline—vision pretraining, general SFT, Physical-AI SFT, and Physical-AI RL—and reports embodied-reasoning gains; research also describes larger 8B and 56B variants alongside the public 7B release. NVIDIA further claims state-of-the-art results on physical-reasoning leaderboards and a 65.7 average across key robotics/AV benchmarks.
About NVIDIA
No company description available.
Industry:
Computer Hardware Manufacturing
Company Size:
10001+
Location:
Santa Clara, California, US
Website:
nvidia.com
Related Models
Last updated: October 3, 2025