Papers
-
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
-
MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding
-
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
-
AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
-
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
-
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
-
Generative Recommendation for Large-Scale Advertising
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
-
VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale
-
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
-
TrajTok: Learning Trajectory Tokens enables better Video Understanding
-
The Art of Efficient Reasoning: Data, Reward, and Optimization
-
World Guidance: World Modeling in Condition Space for Action Generatio
-
The Design Space of Tri-Modal Masked Diffusion Models
-
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
-
ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory
-
UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling
-
From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection
-
VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving
-
Test-Time Training with KV Binding Is Secretly Linear Attention
-
Agents of Chaos
-
S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization
-
gQIR: Generative Quanta Image Reconstruction
-
Compositional Planning with Jumpy World Models
-
SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning
-
Toward the Thermodynamic Limit: Neural Operators for Non-equilibrium Dynamics of Mott Insulators
-
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
-
Haitao Lin
-
How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1
-
Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians
-
Event-Triggered Gossip for Distributed Learning
-
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
-
Discovering Multiagent Learning Algorithms with Large Language Models
-
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
-
ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context
-
Wink: Recovering from Misbehaviors in Coding Agents
-
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
-
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera ControlStanford University
-
The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning
-
SARAH: Spatially Aware Real-time Agentic Humans
-
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
-
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
-
El Agente Gráfico: Structured Execution Graphs for Scientific Agents
-
Unified Latents (UL): How to train your latents
-
Learning to Learn from Language Feedback with Social Meta-Learning
-
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
-
OpenSage: Self-programming Agent Generation Engine
-
Multi-agent cooperation through in-context co-player inference
-
Factored Latent Action World Models
MongoDB - Build AI That Scales
