Papers
-
OSExpert: Computer-Use Agents Learning Professional Skills via Exploration
-
Emergence is Overrated: AGI as an Archipelago of Experts
-
Extend Your Horizon: A Device-Agnostic Surgical Tool Tracking Framework with Multi-View Optimization for Augmented Reality
-
On the Feasibility and Opportunity of Autoregressive 3D Object Detection
-
TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size
-
AutoTraces: Autoregressive Trajectory Forecasting via Multimodal Large Language Models
-
MJ1: Multimodal Judgment via Grounded Verification
-
CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval
-
Aero-Promptness: Drag-Aware Aerodynamic Manipulability for Propeller-driven Vehicles
-
SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning
-
Amortizing Maximum Inner Product Search with Learned Support Functions
-
ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation
-
It's Time to Get It Right: Improving Analog Clock Reading and Clock-Hand Spatial Reasoning in Vision-Language Models
-
PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents
-
FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning
-
Alignment-Process-Outcome: Rethinking How AIs and Humans Collaborate
-
Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing Infrared
-
VSDiffusion: Taming Ill-Posed Shadow Generation via Visibility-Constrained Diffusion
-
AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis
-
Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization
-
Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model
-
ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments
-
DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention
-
Controllable Complex Human Motion Video Generation via Text-to-Skeleton Cascades
-
QualiTeacher: Quality-Conditioned Pseudo-Labeling for Real-World Image Restoration
-
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
-
Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout
-
CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling
-
S2S-FDD: Bridging Industrial Time Series and Natural Language for Explainable Zero-shot Fault Diagnosis
-
Examining the Role of YouTube Production and Consumption Dynamics on the Formation of Extreme Ideologies
-
Speed3R: Sparse Feed-forward 3D Reconstruction Models
-
See and Switch: Vision-Based Branching for Interactive Robot-Skill Programming
-
Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor
-
ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning
-
Adversarial Domain Adaptation Enables Knowledge Transfer Across Heterogeneous RNA-Seq Datasets
-
Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling
-
Evaluating Generative Models via One-Dimensional Code Distributions
-
Deterministic Differentiable Structured Pruning for Large Language Models
-
In-Context Reinforcement Learning for Tool Use in Large Language Models
-
Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models
-
AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem
-
PlayWorld: Learning Robot World Models from Autonomous PlayPrinceton University
-
AtomVLA: Scalable Post-Training for Robotic Manipulation via Predictive Latent World ModelsHuazhong University of Science and Technology, The University of Hong Kong, Tsinghua University
-
Scale Space DiffusionUniversity of Maryland
-
Agentic Critical TrainingUniversity of Maryland
-
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic FeedbackNational University of Singapore, Shanghai AI Lab
-
PostTrainBench: Can LLM Agents Automate LLM Post-Training?ELLIS Institute Tübingen, Max Planck Institute for Intelligent Systems, Tübingen AI Center, University of Tübingen
-
\$OneMillion-Bench: How Far are Language Agents from Human Experts?
-
How Far Can Unsupervised RLVR Scale LLM Training?Peking University, Shanghai AI Lab, Shanghai Jiao Tong University, Tsinghua University, University of Illinois Urbana-Champaign, Xi’an Jiaotong University
-
Sparsity and Out-of-Distribution Generalization
