Papers
-
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
-
AI+HW 2035: Shaping the Next Decade
-
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
-
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline
-
KARL: Knowledge Agents via Reinforcement Learning
-
FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning
-
AgentIR: Reasoning-Aware Retrieval for Deep Research AgentsCarnegie Mellon University, University of Queenland, University of Waterloo
-
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
-
Adaptive Memory Admission Control for LLM Agents
-
ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training
-
Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
-
Single-minus graviton tree amplitudes are nonzero
-
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
-
Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection
-
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
-
InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions
-
ManipulationNet: An Infrastructure for Benchmarking Real-World Robot Manipulation with Physical Skill Challenges and Embodied Multimodal Reasoning
-
V1 : Unifying Generation and Self-Verification for Parallel Reasoners
-
Phi-4-reasoning-vision-15B Technical Report
-
Helios: Real Real-Time Long Video Generation Model
-
EvoSkill: Automated Skill Discovery for Multi-Agent Systems
-
Speculative Speculative Decoding
-
OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation
-
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
-
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing
-
Kling-MotionControl Technical Report
-
Architecting Trust in Artificial Epistemic Agents
-
Utonia: Toward One Encoder for All Point Clouds
-
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
-
Beyond Pixel Histories: World Models with Persistent 3D State
-
Heterogeneous Agent Collaborative Reinforcement Learning
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
-
Modular Memory is the Key to Continual Learning Agents
-
Expanding LLM Agent Boundaries with Strategy-Guided Exploratio
-
ROBOMETER: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
-
LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving
-
Agentic Code Reasoning
-
CuTe Layout Representation and Algebra
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
-
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memorie
-
How Well Does Agent Development Reflect Real-World Work?
-
Learn Hard Problems During RL with Reference Guided Fine-tuning
-
SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
-
Process-of-Thought Reasoning for Videos
-
Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding ModelsHelmholtz Munich, Technical University of Munich, University of Tübingen
-
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
-
Mode Seeking meets Mean Seeking for Fast Long Video Generation
-
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
-
MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding
-
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
