Papers
-
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models
-
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
-
Process Reward Models That Think
-
Distribution Matching Variational AutoEncoder
-
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
-
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
-
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
-
Unsupervised decoding of encoded reasoning using language model interpretability
-
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
-
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
-
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
-
Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability
-
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
-
Learning to Orchestrate Agents in Natural Language with the Conductor
-
TRINITY: An Evolved LLM Coordinator
-
SIMA 2: A Generalist Embodied Agent for Virtual Worlds
-
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
-
Training LLMs for Honesty via Confessions
-
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
-
VIGS-SLAM: Visual Inertial Gaussian Splatting SLAM
-
The Art of Scaling Test-Time Compute for Large Language Models
-
TUNA: Taming Unified Visual Representations for Native Unified Multimodal ModelsMeta Platforms / King Abdullah University of Science and Technology (KAUST), The University of Hong Kong, University of Waterloo
-
The Adoption and Usage of AI Agents: Early Evidence from Perplexity
-
ThetaEvolve: Test-time Learning on Open ProblemsMicrosoft / Carnegie Mellon University, University of California, University of Washington, University of Wisconsin-Madison
-
LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models
-
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
-
LayerComposer: Multi-Human Personalized Generation via Layered Canvas
-
UI-CUBE: Enterprise-Grade Computer Use Agent Benchmarking Beyond Task Accuracy to Operational Reliability
-
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
-
Early Science Acceleration Experiments with GPT-5OpenAI / Collège de France, Columbia University, Harvard University, Lawrence Livermore National Laboratory, The Jackson Laboratory, University of California, University of Cambridge, University of Oxford, Vanderbilt University
-
Anthropic Economic Index report: Uneven geographic and enterprise AI adoption
-
Weight-Sparse Transformers Have Interpretable Circuits
-
SageServe: Optimizing LLM Serving on Cloud Data Centers with Forecast Aware Auto-Scaling
-
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
-
Steering Language Models with Weight ArithmeticUniversity of Copenhagen
-
Step-Audio-EditX Technical Report
-
s3: You Don't Need That Much Data to Train a Search Agent via RL
-
Kimi Linear: An Expressive, Efficient Attention ArchitectureMoonshot AI / Hangzhou Institute of Medicine, Hong Kong University of Science and Technology, Massachusetts Institute of Technology, Sichuan Univ
-
Continuous Autoregressive Language Models
-
Context Engineering 2.0: The Context of Context Engineering
-
Charts Are Not Images: On the Challenges of Scientific Chart Editing
-
Signs of introspection in large language models
-
An efficient probabilistic hardware architecture for diffusion-like models
-
Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
-
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
-
Generating Creative Chess Puzzles
-
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)Lila Sciences / Allen Institute for Artificial Intelligence, Carnegie Mellon University, Stanford University, University of Washington
-
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
-
VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics ExpressionsPohang University of Science and Technology, Ulsan National Institute of Science and Technology
-
Step2Motion: Locomotion Reconstruction from Pressure Sensing InsolesMax Planck Institute for Informatics, Universitat Politècnica de Catalunya
MongoDB - Build AI That Scales
