Papers
-
Power Aware Dynamic Reallocation For Inference
-
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
-
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
-
Agentic Reasoning for Large Language Models
-
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
-
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
-
VINO: A Unified Visual Generator with Interleaved OmniModal Context
-
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
-
OCTOBENCH: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding
-
TranslateGemma Technical Report
-
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
-
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
-
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
-
Reasoning Models Generate Societies of Thought
-
Hardware Acceleration for Neural Networks: A Comprehensive SurveyArizona State University
-
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
-
Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering
-
TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
-
Apollo: Unified Audio-Video Joint Generation
-
Controlled LLM Training on Spectral Sphere
-
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
-
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World ModelsThe Hong Kong Polytechnic University
-
AgriAgent: Contract-Driven Planning and Capability-Aware Tool Orchestration in Real-World Agriculture
-
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
-
Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL
-
BabyVision: Visual Reasoning Beyond Language
-
RigMo: Unifying Rig and Motion Learning for Generative Animation
-
AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines
-
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
-
UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos
-
Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation
-
One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection
-
GenCtrl -- A Formal Controllability Toolkit for Generative Models
-
Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
-
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
-
GR-Dexter Technical Report
-
Challenges and Research Directions for Large Language Model Inference Hardware
-
Pixel-Perfect Visual Geometry Estimation
-
DocDancer: Towards Agentic Document-Grounded Information Seeking
-
Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing
-
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
-
Internal Representations as Indicators of Hallucinations in Agent Tool Selection
-
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
-
FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning
-
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
-
Extracting books from production language models
-
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
-
A Versatile Multimodal Agent for Multimedia Content Generation
-
Efficient Context Scaling with LongCat ZigZag Attention
-
Pearmut: Human Evaluation of Translation Made Trivial
