Papers
-
World Guidance: World Modeling in Condition Space for Action Generation
-
World Guidance: World Modeling in Condition Space for Action Generatio
-
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
-
BitDance: Scaling Autoregressive Generative Models with Binary Tokens
-
BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation
-
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
-
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
-
GR-Dexter Technical Report
-
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
-
ThinkGen: Generalized Thinking for Visual Generation
-
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
-
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
-
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
-
MEF: A Systematic Evaluation Framework for Text-to-Image Models
-
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets
-
Diffusion Adversarial Post-Training for One-Step Video Generatio
-
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
-
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
-
Unconditional Diffusion for Generative Sequential Recommendation
-
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
-
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
-
Model Merging in Pre-training of Large Language Models
-
Investigating the Overlooked Hessian Structure: From CNNs to LLMs
-
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
-
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
-
Long Context Tuning for Video Generation
-
Reviving The Classics: Active Reward Modeling in Large Language Model Alignment
-
Understanding Chain-of-Thought in LLMs through Information Theory
-
arsier: Recipes for Training and Evaluating Large Video Description Models
-
Magic-Me: Identity-Specific Video Customized Diffusion
-
Speech Translation with Large Language Models: An Industrial Practice
-
Connecting Speech Encoder and Large Language Model for ASR
-
Monolith: Real Time Recommendation System With Collisionless Embedding Table
-
Learning When to Translate for Streaming Speech
MongoDB - Build AI That Scales
