Papers
-
StyleGallery: Training-free and Semantic-aware Personalized Style Transfer from Arbitrary Image ReferencesHunan University, National University of Defense Technology
-
Utility Function is All You Need: LLM-based Congestion ControlAkamai Technologies / Fraunhofer-Institut für Sichere Informationstechnologie, Technische Universitat Berlin, Weizenbaum Institute
-
HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation
-
One Token, Two Fates: A Unified Framework via Vision Token Manipulation Against MLLMs HallucinationNanjing University, Southeast University
-
Geometric Autoencoder for Diffusion ModelsShanghai Innovation Institute, Tsinghua University
-
Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking
-
Beyond Interleaving: Causal Attention Reformulations for Generative Recommender Systems
-
GeoSense: Internalizing Geometric Necessity Perception for Multimodal ReasoningMohamed bin Zayed University of Artificial Intelligence, Stanford University, University of Chinese Academy of Sciences, University of Science and Technology of China
-
Speech Codec Probing from Semantic and Phonetic PerspectivesUniversity of Southern California
-
Edge-Assisted Multi-Robot Visual-Inertial SLAM with Efficient CommunicationThe Institute of Electrical and Electronics Engineers
-
Few-Shot Adaptation to Non-Stationary Environments via Latent Trend Embedding for RoboticsIchinoseki College, Ritsumeikan University, The University of Osaka
-
Reactive Writers: How Co-Writing with AI Changes How We Engage with IdeasBauhaus University, Cornell Tech, Princeton University, University of Washington
-
Causal Concept Graphs in LLM Latent Space for Stepwise ReasoningDaffodil International University, New York University
-
Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design
-
Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and StabilityChinese Academy of Sciences, King Abdullah University of Science and Technology (KAUST), Mohamed bin Zayed University of Artificial Intelligence, Provable Responsible AI and Data Analytics Lab, The Hong Kong Polytechnic University, University of Chinese Academy of Sciences
-
Variance-Aware Adaptive Weighting for Diffusion Model TrainingKennesaw State University
-
Safe Probabilistic Planning for Human-Robot Interaction using Conformal Risk ControlUniversity of Washington
-
Graph-GRPO: Training Graph Flow Models with Reinforcement LearningBeihang University, Beijing University of Posts and Telecommunications, National University of Singapore, The University of Sheffield
-
Verbalizing LLM's Higher-order Uncertainty via Imprecise ProbabilitiesCISPA Helmholtz Center for Information Security, Manchester Centre for AI Fundamentals, Nanyang Technological University, The University of Manchester, The University of Tokyo
-
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGDPeking University, RIKEN Center for Advanced Intelligence Project, Shanghai Jiao Tong University, The Institute of Statistical Mathematics, The University of Tokyo
-
Multi-Person Pose Estimation Evaluation Using Optimal Transportation and Improved Pose Matching
-
GLM-OCR Technical Report
-
Just-in-Time: Training-Free Spatial Acceleration for Diffusion TransformersCapital Normal University, University of Electronic Science and Technology of China
-
DiT4DiT: Jointly Modeling Video Dynamics and Actions for Generalizable Robot Control
-
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
-
Quality over Quantity: Demonstration Curation via Influence Functions for Data-Centric Robot Learning
-
Adaptive Active Learning for Online Reliability Prediction of Satellite Electronics
-
Dynamic Multi-period Experts for Online Time Series Forecasting
-
Learning Adaptive LLM Decoding
-
Verifying Good Regulator Conditions for Hypergraph Observers: Natural Gradient Learning from Causal Invariance via Established Theorems
-
Intelligent Spatial Estimation for Fire Hazards in Engineering Sites: An Enhanced YOLOv8-Powered Proximity Analysis Framework
-
A Text-Native Interface for Generative Video Authoring
-
Exclusive Self Attention
-
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
-
PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing
-
OmniEdit: A Training-free framework for Lip Synchronization and Audio-Visual Editing
-
Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
-
Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges
-
Exploring Collatz Dynamics with Human-LLM Collaboration
-
Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms
-
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction
-
Chain of Event-Centric Causal Thought for Physically Plausible Video Generation
-
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
-
MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration
-
Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon
-
Training-free Motion Factorization for Compositional Video Generation
-
Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations
-
VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs
-
Progressive Representation Learning for Multimodal Sentiment Analysis with Incomplete Modalities
-
PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings
MongoDB - Build AI That Scales
