Papers
-
MAI-Thinking-1: Building a Hill-Climbing Machine
-
Writing Code vs. Shipping Code: Productivity Effects Across Generations of AI Coding ToolsMicrosoft / Massachusetts Institute of Technology, National Bureau of Economic Research (NBER), University of Pennsylvania
-
Memento: Teaching LLMs to Manage Their Own Context
-
The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More
-
CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?
-
Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents
-
AutoAdapt: An Automated Domain Adaptation Framework for LLMs
-
OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security
-
StreamWise: Serving Multi-Modal Generation in Real-Time at Scale
-
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
-
LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis
-
OpenFrontier: General Navigation with Visual-Language Grounded Frontiers
-
On the Necessity of Learnable Sheaf Laplacians
-
Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces
-
Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
-
Phi-4-reasoning-vision-15B Technical Report
-
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
-
Beyond Pixel Histories: World Models with Persistent 3D State
-
Modular Memory is the Key to Continual Learning Agents
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
-
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
-
Experiential Reinforcement Learning
-
WizardLM: Empowering large pre-trained language models to follow complex instructions
-
Florence: A New Foundation Model for Computer Vision
-
LLM-in-Sandbox Elicits General Agentic Intelligence
-
On-Policy Context Distillation for Language Models
-
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
-
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning
-
LIVE: Long-horizon Interactive Video World Modeling
-
Closing the Loop: Universal Repository Representation with RPG-Encoder
-
CUA-Skill: Develop Skills for Computer Using Agent
-
AgentRx: Diagnosing AI Agent Failures from Execution Trajectories
-
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
-
LLM-42: Enabling Determinism in LLM Inference with Verified Speculation
-
Lost in Transmission: When and Why LLMs Fail to Reason Globally
-
Efficient Autoregressive Video Diffusion with Dummy HeadMicrosoft / ETH Zurich, Johns Hopkins University, Tsinghua University, University of Science and Technology of China
-
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMsMicrosoft / Hong Kong University of Science and Technology, Massachusetts Institute of Technology, Shanghai Artificial Intelligence Laboratory, Shanghai Jiao Tong University, Tsinghua University
-
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
-
Controlled LLM Training on Spectral Sphere
-
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
-
Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation
-
From Word to World: Can Large Language Models be Implicit Text-based World Models?
-
Sigma-MoE-Tiny Technical Report
-
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
-
Spatia: Video Generation with Updatable Spatial Memory
-
Native and Compact Structured Latents for 3D Generation
-
Wait, Wait, Wait... Why Do Reasoning Models Loop?
-
Glance: Accelerating Diffusion Models with 1 Sample
-
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
MongoDB - Build AI That Scales
