Papers
-
Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality
-
DeepSight: Bridging Depth Maps and Language with a Depth-Driven Multimodal Model
-
Enhancing Neural Video Compression of Static Scenes with Positive-Incentive Noise
-
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
-
ButterflyViT: 354$\times$ Expert Compression for Edge Vision Transformers
-
Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra
-
Making Implicit Premises Explicit in Logical Understanding of Enthymemes
-
Dynamic Momentum Recalibration in Online Gradient Learning
-
FedARKS: Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration for Person Re-identification
-
Diffusion Language Models Are Natively Length-Aware
-
A Hazard-Informed Data Pipeline for Robotics Physical Safety
-
Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
-
Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models
-
Predictive Coding Graphs are a Superset of Feedforward Neural Networks
-
Longitudinal NSCLC Treatment Progression via Multimodal Generative Models
-
Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment
-
VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
-
Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations
-
Efficient Vector Search in the Wild: One Model for Multi-K Queries
-
Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR
-
Reflective Flow Sampling Enhancement
-
FreeOcc: Training-free Panoptic Occupancy Prediction via Foundation Models
-
A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement
-
JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas
-
Robotic Foundation Models for Industrial Control: A Comprehensive Survey and Readiness Assessment Framework
-
XMACNet: An Explainable Lightweight Attention based CNN with Multi Modal Fusion for Chili Disease Classification
-
Optimizing 3D Diffusion Models for Medical Imaging via Multi-Scale Reward Learning
-
Making Training-Free Diffusion Segmentors Scale with the Generative Power
-
Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning
-
Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots
-
CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation
-
SpaCRD: Multimodal Deep Fusion of Histology and Spatial Transcriptomics for Cancer Region Detection
-
Random Quadratic Form on a Sphere: Synchronization by Common Noise
-
Whisper-CD: Accurate Long-Form Speech Recognition using Multi-Negative Contrastive Decoding
-
MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue
-
Wisdom of the AI Crowd (AI-CROWD) for Ground Truth Approximation in Content Analysis: A Research Protocol & Validation Using Eleven Large Language Models
-
LIT-RAGBench: Benchmarking Generator Capabilities of Large Language Models in Retrieval-Augmented Generation
-
Latent Autoencoder Ensemble Kalman Filter for Data assimilation
-
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling
-
Adaptive Language-Aware Image Reflection Removal Network
-
Point-Supervised Skeleton-Based Human Action Segmentation
-
VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction
-
EarthBridge: A Solution for 4th Multi-modal Aerial View Image Challenge Translation Track
-
Topological descriptors of foot clearance gait dynamics improve differential diagnosis of Parkinsonism
-
Cut to the Chase: Training-free Multimodal Summarization via Chain-of-Events
-
EntON: Eigenentropy-Optimized Neighborhood Densification in 3D Gaussian Splatting
-
Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI
-
Word-Anchored Temporal Forgery Localization
-
SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models
-
FedSCS-XGB -- Federated Server-centric surrogate XGBoost for continual health monitoring
