Xiaomi

NVIDIA Google Microsoft Amazon Meta Platforms Tesla OpenAI WordPress Tencent Amazon Web Services ByteDance Alibaba Anthropic Cisco Systems xAI Salesforce Shopify LinkedIn Adobe CrowdStrike

At Xiaomi, we believe technology’s the true power lies in its ability to understand and enhance the human experience. This year, with the upgraded Xiaomi HyperOS 2 and the newly launched Xiaomi HyperAI, we are redefining connection.

Beijing, China

🇨🇳

Visit website

AI Native

Number of tools

Profitable

Yes

Valuation

$122.20BAI

Tools 0 Models 6 Robots 3 Devices 1 Papers 20 Repositories 84

Models

Gen 3

Xiaomi MiMo V2 TTS

MiMo-V2-TTS is Xiaomi’s large-scale speech synthesis model built for expressive agent voice, aiming for natural, emotionally aware speech.

🔊Text to speech 🎤Voice changing 🎙️Voiceovers 🎤Singing

NewAudio

Released 6d ago
Gen 3

Xiaomi MiMo V2 Omni

MiMo-V2-Omni is an omni foundation model that unifies multimodal understanding with agentic capability, built to see, hear, and act.

📚Large Language Models 🎥Video summaries 📚Audio summaries 🔍Image interpretation

NewMultimodal

Released 6d ago
Gen 3

Xiaomi MiMo V2 Pro

MiMo-V2-Pro is Xiaomi’s flagship foundation model built for real-world agent workloads, designed to act as the “brain” of agent systems that orchestrate complex workflows and tool use.

📚Large Language Models 🤖Agents 🔄Workflow optimization 💻Conversational coding

NewText

Released 6d ago
Gen 3

Xiaomi Robotics 0

Xiaomi-Robotics-0 is a 4.7B-parameter open Vision-Language-Action model that uses a Mixture-of-Transformers design, combining a Qwen3-based vision-language brain with a diffusion transformer controller for smooth, real-time robot manipulation on benchmarks and real robots.

🔒Private conversations 📝Lecture summaries 🖼️Blog images

NewMultimodal

Released 1mo ago
Gen 7

MiMo V2 Flash

I cannot find public technical documentation for a distinct “MiMo v2 Flash” model beyond Xiaomi’s MiMo-7B and MiMo-VL releases, so I cannot reliably describe that specific variant without guessing.

Text

Released 3mo ago
Gen 4

Pixel Perfect Depth

Pixel-Perfect Depth is a monocular depth estimation model that uses pixel-space diffusion transformers to predict high-quality, flying-pixel-free depth maps for dense point clouds, accepted at NeurIPS 2025.

🖼️Image generation 🔍Image upscaling 🔍Image recognition

Image

Released 5mo ago

Papers

Learning Diverse Skills for Behavior Models with Mixture of Experts

1 author
Utonia: Toward One Encoder for All Point Clouds

The University of Hong Kong

Published on: 2026-03-03 1 author
LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving

Published on: 2026-03-02 1 author
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models

Wuhan University

Published on: 2026-02-27 1 author
MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding

Tongji University

Published on: 2026-02-26 1 author
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding

Huazhong University of Science and Technology

Published on: 2026-02-26 1 author
UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling

University of Illinois Urbana-Champaign

Published on: 2026-02-24 1 author
From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

Wuhan University

Published on: 2026-02-24 1 author
VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving

Tianjin University

Published on: 2026-02-24 1 author
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

Xiaomi Robotics

Published on: 2026-02-13 1 author
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model

Tsinghua University

Published on: 2026-02-12 1 author
Federated Balanced Learning

Published on: 2026-02-09 1 author
DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving

Published on: 2026-02-06 1 author
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

Huazhong University of Science and Technology

Published on: 2026-02-05 1 author
From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs

University of Tokyo

Published on: 2026-01-20 1 author
Pixel-Perfect Visual Geometry Estimation

Published on: 2026-01-08 1 author
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

Huazhong University of Science and Technology

Published on: 2025-12-31 1 author
Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation

Published on: 2025-12-29 1 author
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

The University of Hong Kong

Published on: 2025-12-19 1 author
DVGT: Driving Visual Geometry Transformer

Tsinghua University

Published on: 2025-12-18 1 author

Search

Xiaomi

Tools

Models

Robots

Devices

Papers

Repositories

Help

People also viewed

Feedback and Incident Report

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: