Papers
-
Fusion Complexity Inversion: Why Simpler Cross View Modules Outperform SSMs and Cross View Attention Transformers for Pasture Biomass Regression
-
Column Generation for the Micro-Transit Zoning Problem
-
Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation
-
Transferable Optimization Network for Cross-Domain Image Reconstruction
-
GazeShift: Unsupervised Gaze Estimation and Dataset for VR
-
Gradient Iterated Temporal-Difference Learning
-
AI Misuse in Education Is a Measurement Problem: Toward a Learning Visibility Framework
-
DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation
-
AI Steerability 360: A Toolkit for Steering Large Language Models
-
On the Formal Limits of Alignment Verification
-
Training-free Temporal Object Tracking in Surgical Videos
-
An Efficient and Effective Evaluator for Text2SQL Models on Unseen and Unlabeled Data
-
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial IntelligenceBeihang University, Fudan University, Hong Kong University of Science and Technology, Nanyang Technological University, Northwestern Polytechnical University, Peking University, Shanghai AI Lab, Shanghai Jiao Tong University, Sichuan University, The Chinese University of Hong Kong, Tsinghua University
-
ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene ReconstructionKing's College London, Mohamed bin Zayed University of Artificial Intelligence, The University of Hong Kong, The University of Sydney
-
Scalable Training of Mixture-of-Experts Models with Megatron Core
-
Intentional Deception as Controllable Capability in LLM AgentsUniversity of Idaho
-
Scalable Training of Mixture-of-Experts Models with Megatron Core
-
Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network
-
Aligning What EEG Can See: Structural Representations for Brain-Vision Matching
-
CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs
-
Entropy-Aware On-Policy Distillation of Language Models
-
VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness
-
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction
-
Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR
-
mAVE: A Watermark for Joint Audio-Visual Generation Models
-
Statistical Contraction for Chance-Constrained Trajectory Optimization of Non-Gaussian Stochastic Systems
-
Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction
-
NuNext: Reframing Nucleus Detection as Next-Point Detection
-
Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints
-
Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation
-
Deep Generative Spatiotemporal Engression for Probabilistic Forecasting of Epidemics
-
Vision Language Models Cannot Reason About Physical Transformation
-
Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information
-
Efficient Chest X-ray Representation Learning via Semantic-Partitioned Contrastive Learning
-
aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness
-
Turn: A Language for Agentic Computation
-
TIQA: Human-Aligned Text Quality Assessment in Generated Images
-
Inter-Image Pixel Shuffling for Multi-focus Image Fusion
-
Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers
-
Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
-
The Model Knows Which Tokens Matter: Automatic Token Selection via Noise Gating
-
Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language
-
PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection
-
CanoVerse: 3D Object Scalable Canonicalization and Dataset for Generation and Pose
-
LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
-
Fine-Grained Table Retrieval Through the Lens of Complex Queries
-
Agentic Planning with Reasoning for Image Styling via Offline RL
-
AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
-
Improving reasoning at inference time via uncertainty minimisation
-
Spectral Conditioning of Attention Improves Transformer Performance
