Papers
-
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
-
Reasoning Models Generate Societies of Thought
-
Hardware Acceleration for Neural Networks: A Comprehensive SurveyArizona State University
-
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
-
TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
-
Apollo: Unified Audio-Video Joint Generation
-
Controlled LLM Training on Spectral Sphere
-
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
-
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World ModelsThe Hong Kong Polytechnic University
-
AgriAgent: Contract-Driven Planning and Capability-Aware Tool Orchestration in Real-World Agriculture
-
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
-
Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL
-
BabyVision: Visual Reasoning Beyond Language
-
RigMo: Unifying Rig and Motion Learning for Generative Animation
-
GenCtrl -- A Formal Controllability Toolkit for Generative Models
-
Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
-
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
-
GR-Dexter Technical Report
-
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
-
Internal Representations as Indicators of Hallucinations in Agent Tool Selection
-
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
-
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
-
Pearmut: Human Evaluation of Translation Made Trivial
-
Listen to Rhythm, Choose Movements: Autoregressive Multimodal Dance Generation via Diffusion and Mamba with Decoupled Dance Dataset
-
RIMRULE: Improving Tool-Using Language Agents via MDL-Guided Rule Learning
-
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
-
Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation
-
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
-
ELLA: Efficient Lifelong Learning for Adapters
-
Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes
-
mHC: Manifold-Constrained Hyper-Connections
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
-
AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting
-
Animated 3DGS Avatars in Diverse Scenes with Consistent Lighting and Shadows
-
NarrativeTrack: Evaluating Video Language Models Beyond the Frame
-
Delay-Tolerant Networking for Tsunami Evacuation on the Small Island of Hachijojima: A Study of Epidemic and Prophet Routing
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
-
ThinkGen: Generalized Thinking for Visual Generation
-
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
-
DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO
-
SemanticGen: Video Generation in Semantic Space
-
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
-
FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
-
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
-
From Word to World: Can Large Language Models be Implicit Text-based World Models?
-
Sigma-MoE-Tiny Technical Report
-
Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking
-
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.
-
Kling-Omni Technical Report
MongoDB - Build AI That Scales
