Papers
-
CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models
-
SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion
-
Catalyst4D: High-Fidelity 3D-to-4D Scene Editing via Dynamic Propagation
-
SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models
-
PVI: Plug-in Visual Injection for Vision-Language-Action Models
-
Empowering Semantic-Sensitive Underwater Image Enhancement with VLM
-
The RIGID Framework: Research-Integrated, Generative AI-Mediated Instructional Design
-
Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks
-
Generalized Recognition of Basic Surgical Actions Enables Skill Assessment and Vision-Language-Model-based Surgical Planning
-
Think and Answer ME: Benchmarking and Exploring Multi-Entity Reasoning Grounding in Remote Sensing
-
Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass
-
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
-
A Fractional Fox H-Function Kernel for Support Vector Machines: Robust Classification via Weighted Transmutation Operators
-
SteerRM: Debiasing Reward Models via Sparse Autoencoders
-
Spectral Defense Against Resource-Targeting Attack in 3D Gaussian Splatting
-
What Makes VLMs Robust? Towards Reconciling Robustness and Accuracy in Vision-Language Models
-
GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification
-
A Multi-task Large Reasoning Model for Molecular Science
-
OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution
-
Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations
-
Residual SODAP: Residual Self-Organizing Domain-Adaptive Prompting with Structural Knowledge Preservation for Continual Learning
-
Adaptive Vision-Language Model Routing for Computer Use Agents
-
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
-
Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design
-
From AI Weather Prediction to Infrastructure Resilience: A Correction-Downscaling Framework for Tropical Cyclone Impacts
-
coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation
-
Hierarchical Dual-Change Collaborative Learning for UAV Scene Change Captioning
-
Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
-
DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training
-
Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation
-
Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered Outliers
-
Vision-Language Based Expert Reporting for Painting Authentication and Defect Detection
-
Team LEYA in 10th ABAW Competition: Multimodal Ambivalence/Hesitancy Recognition Approach
-
On Linear Separability of the MNIST Handwritten Digits Dataset
-
Draft-and-Target Sampling for Video Generation Policy
-
Wear Classification of Abrasive Flap Wheels using a Hierarchical Deep Learning Approach
-
I Know What I Don't Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning
-
Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation
-
Surrogates for Physics-based and Data-driven Modelling of Parametric Systems: Review and New Perspectives
-
CLARIN-PT-LDB: An Open LLM Leaderboard for Portuguese to assess Language, Culture and Civility
-
TRACE: Structure-Aware Character Encoding for Robust and Generalizable Document Watermarking
-
Test-time RL alignment exposes task familiarity artifacts in LLM benchmarks
-
Explainable AI Using Inherently Interpretable Components for Wearable-based Health Monitoring
-
Enhanced Drug-drug Interaction Prediction Using Adaptive Knowledge Integration
-
A protocol for evaluating robustness to H&E staining variation in computational pathology models
-
Forecasting Epileptic Seizures from Contactless Camera via Cross-Species Transfer Learning
-
Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models
-
Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts
-
A theory of learning data statistics in diffusion models, from easy to hard
-
Spectral-Geometric Neural Fields for Pose-Free LiDAR View Synthesis
MongoDB - Build AI That Scales
