Papers
-
Learning Diverse Skills for Behavior Models with Mixture of Experts
-
Utonia: Toward One Encoder for All Point Clouds
-
LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving
-
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
-
MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding
-
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
-
UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling
-
From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection
-
VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving
-
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution
-
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model
-
Federated Balanced Learning
-
DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving
-
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
-
From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs
-
Pixel-Perfect Visual Geometry Estimation
-
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World
-
Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation
-
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation
-
DVGT: Driving Visual Geometry Transformer
MongoDB - Build AI That Scales
