Papers
-
The Art of Scaling Test-Time Compute for Large Language Models
-
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
-
The Adoption and Usage of AI Agents: Early Evidence from Perplexity
-
ThetaEvolve: Test-time Learning on Open Problems
-
LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models
-
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
-
LayerComposer: Multi-Human Personalized Generation via Layered Canvas
-
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
-
Early Science Acceleration Experiments with GPT-5
-
Anthropic Economic Index report: Uneven geographic and enterprise AI adoption
-
Weight-Sparse Transformers Have Interpretable Circuits
-
SageServe: Optimizing LLM Serving on Cloud Data Centers with Forecast Aware Auto-Scaling
-
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
-
Steering Language Models with Weight ArithmeticUniversity of Copenhagen
-
Step-Audio-EditX Technical Report
-
s3: You Don't Need That Much Data to Train a Search Agent via RL
-
Kimi Linear: An Expressive, Efficient Attention Architecture
-
Continuous Autoregressive Language Models
-
Charts Are Not Images: On the Challenges of Scientific Chart Editing
-
Signs of introspection in large language models
-
An efficient probabilistic hardware architecture for diffusion-like models
-
Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
-
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
-
Generating Creative Chess Puzzles
-
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)Lila Sciences / Allen Institute for Artificial Intelligence, Carnegie Mellon University, Stanford University, University of Washington
-
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
-
VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics ExpressionsPohang University of Science and Technology, Ulsan National Institute of Science and Technology
-
Step2Motion: Locomotion Reconstruction from Pressure Sensing InsolesMax Planck Institute for Informatics, Universitat Politècnica de Catalunya
-
HunyuanVideo 1.5 Technical Report
-
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
-
Evaluating honesty and lie detection techniques on a diverse suite of dishonest models
-
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
-
ATLAS: Practical Scaling Laws for Multilingual Models
-
Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction
-
AlphaFlow: Understanding and Improving MeanFlow Models
-
Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
-
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
-
Rope to Nope and Back Again: A New Hybrid Attention Strategy
-
MEF: A Systematic Evaluation Framework for Text-to-Image Models
-
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets
-
Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model
-
Glyph: Scaling Context Windows via Visual-Text Compression
-
Arctic-Extract Technical Report
-
Cortex AISQL: A Production SQL Engine for Unstructured Data
-
When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password CrackingFuture Data Minds Research Lab
-
Chronos-2: From Univariate to Universal Forecasting
-
Agentic Misalignment: How LLMs Could Be Insider ThreatsUniversity College London
-
Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Servin
-
Shifting Work Patterns with Generative AI
-
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
