Papers
-
Slim attention: cut your context memory in half without loss – K-cache is all you need for MHA
-
Process-of-Thought Reasoning for Videos
-
Mode Seeking meets Mean Seeking for Fast Long Video Generation
-
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
-
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene GenerationTsinghua University
-
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
-
Generative Recommendation for Large-Scale Advertising
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
-
VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale
-
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
-
TrajTok: Learning Trajectory Tokens enables better Video Understanding
-
World Guidance: World Modeling in Condition Space for Action Generation
-
World Guidance: World Modeling in Condition Space for Action Generatio
-
The Design Space of Tri-Modal Masked Diffusion Models
-
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
-
Test-Time Training with KV Binding Is Secretly Linear Attention
-
gQIR: Generative Quanta Image Reconstruction
-
Compositional Planning with Jumpy World Models
-
SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning
-
Toward the Thermodynamic Limit: Neural Operators for Non-equilibrium Dynamics of Mott Insulators
-
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
-
Haitao Lin
-
How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1
-
Discovering Multiagent Learning Algorithms with Large Language Models
-
ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context
-
Wink: Recovering from Misbehaviors in Coding Agents
-
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
-
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera ControlStanford University
-
The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning
-
SARAH: Spatially Aware Real-time Agentic Humans
-
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
-
El Agente Gráfico: Structured Execution Graphs for Scientific Agents
-
Unified Latents (UL): How to train your latents
-
Tuning-free Visual Effect Transfer across Videos
-
EVMbench: Evaluating AI Agents on Smart Contract Security
-
EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
-
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
-
jina-embeddings-v5-text: Task-Targeted Embedding Distillation
-
World Action Models are Zero-shot Policies
-
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
-
GLM-5: from Vibe Coding to Agentic Engineering
-
Image Generation with a Sphere Encoder
-
BitDance: Scaling Autoregressive Generative Models with Binary Tokens
-
Experiential Reinforcement Learning
-
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
-
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
-
Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment
-
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
-
WizardLM: Empowering large pre-trained language models to follow complex instructions
MongoDB - Build AI That Scales
