Papers
-
An Explainable Ensemble Learning Framework for Crop Classification with Optimized Feature Pyramids and Deep Networks
-
GIFT: Global Irreplaceability Frame Targeting for Efficient Video Understanding
-
Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers
-
Sparse Visual Thought Circuits in Vision-Language Models
-
Bridging Perception and Reasoning: Token Reweighting for RLVR in Multimodal LLMs
-
Learning domain-invariant features through channel-level sparsification for Out-Of Distribution Generalization
-
Visual Attention Drifts,but Anchors Hold:Mitigating Hallucination in Multimodal Large Language Models via Cross-Layer Visual Anchors
-
THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
-
ETA-VLA: Efficient Token Adaptation via Temporal Fusion and Intra-LLM Sparsification for Vision-Language-Action Models
-
Pixelis: Reasoning in Pixels, from Seeing to Acting
-
Process-Aware AI for Rainfall-Runoff Modeling: A Mass-Conserving Neural Framework with Hydrological Process Constraints
-
ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents
-
Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization
-
From Logic Monopoly to Social Contract: Separation of Power and the Institutional Foundations for Autonomous Agent Economies
-
Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods
-
UCAgent: An End-to-End Agent for Block-Level Functional Verification
-
Layer-Specific Lipschitz Modulation for Fault-Tolerant Multimodal Representation Learning
-
OMIND: Framework for Knowledge Grounded Finetuning and Multi-Turn Dialogue Benchmark for Mental Health LLMs
-
Label What Matters: Modality-Balanced and Difficulty-Aware Multimodal Active Learning
-
MSRL: Scaling Generative Multimodal Reward Modeling via Multi-Stage Reinforcement Learning
-
MoireMix: A Formula-Based Data Augmentation for Improving Image Classification Robustness
-
SEVerA: Verified Synthesis of Self-Evolving Agents
-
Deep Learning Aided Vision System for Planetary Rovers
-
Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
-
A Comparative Investigation of Thermodynamic Structure-Informed Neural Networks
-
When Sensing Varies with Contexts: Context-as-Transform for Tactile Few-Shot Class-Incremental Learning
-
AnyDoc: Enhancing Document Generation via Large-Scale HTML/CSS Data Synthesis and Height-Aware Reinforcement Optimization
-
The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning
-
MCLMR: A Model-Agnostic Causal Learning Framework for Multi-Behavior Recommendation
-
AirSplat: Alignment and Rating for Robust Feed-Forward 3D Gaussian Splatting
-
Denoise and Align: Towards Source-Free UDA for Robust Panoramic Semantic Segmentation
-
Robust Principal Component Completion
-
RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following
-
EgoXtreme: A Dataset for Robust Object Pose Estimation in Egocentric Views under Extreme Conditions
-
Reinforcement learning for quantum processes with memory
-
SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment
-
IncreRTL: Traceability-Guided Incremental RTL Generation under Requirement Evolution
-
FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation
-
ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation
-
Learning to Rank Caption Chains for Video-Text Alignment
-
Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence
-
Goodness-of-pronunciation without phoneme time alignment
-
UniAI-GraphRAG: Synergizing Ontology-Guided Extraction, Multi-Dimensional Clustering, and Dual-Channel Fusion for Robust Multi-Hop Reasoning
-
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
-
Vision Hopfield Memory Networks
-
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
-
A Semantically Disentangled Unified Model for Multi-category 3D Anomaly Detection
-
SportSkills: Physical Skill Learning from Sports Instructional Videos
-
PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems
-
Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds
MongoDB - Build AI That Scales
