Papers
-
Vision Verification Enhanced Fusion of VLMs for Efficient Visual Reasoning
-
Spatially Grounded Long-Horizon Task Planning in the Wild
-
Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs
-
MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization
-
Bin~Wan,G2HFNet: GeoGran-Aware Hierarchical Feature Fusion Network for Salient Object Detection in Optical Remote Sensing Images
-
Colluding LoRA: A Composite Attack on LLM Safety Alignment
-
Experimental evidence of progressive ChatGPT models self-convergence
-
Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers
-
RSONet: Region-guided Selective Optimization Network for RGB-T Salient Object Detection
-
STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
-
CM-Bench: A Comprehensive Cross-Modal Feature Matching Benchmark Bridging Visible and Infrared Images
-
HSEmotion Team at ABAW-10 Competition: Facial Expression Recognition, Valence-Arousal Estimation, Action Unit Detection and Fine-Grained Violence Classification
-
RXNRECer Enables Fine-grained Enzymatic Function Annotation through Active Learning and Protein Language Models
-
HaltNav: Reactive Visual Halting over Lightweight Topological Priors for Robust Vision-Language Navigation
-
EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning
-
Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration
-
FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning
-
VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos
-
Cost-Efficient Multimodal LLM Inference via Cross-Tier GPU Heterogeneity
-
HFP-SAM: Hierarchical Frequency Prompted SAM for Efficient Marine Animal Segmentation
-
AI Planning Framework for LLM-Based Web Agents
-
Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval
-
Design-Specification Tiling for ICL-based CAD Code Generation
-
Deep Learning Based Estimation of Blood Glucose Levels from Multidirectional Scleral Blood Vessel Imaging
-
UNIStainNet: Foundation-Model-Guided Virtual Staining of H&E to IHC
-
Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation
-
The COTe score: A decomposable framework for evaluating Document Layout Analysis models
-
IGASA: Integrated Geometry-Aware and Skip-Attention Modules for Enhanced Point Cloud Registration
-
CMHANet: A Cross-Modal Hybrid Attention Network for Point Cloud Registration
-
CognitionCapturerPro: Towards High-Fidelity Visual Decoding from EEG/MEG via Multi-modal Information and Asymmetric Alignment
-
SciDesignBench: Benchmarking and Improving Language Models for Scientific Inverse Design
-
Graph In-Context Operator Networks for Generalizable Spatiotemporal Prediction
-
Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems
-
On Using Machine Learning to Early Detect Catastrophic Failures in Marine Diesel Engines
-
VecMol: Vector-Field Representations for 3D Molecule Generation
-
SRAM-Based Compute-in-Memory Accelerator for Linear-decay Spiking Neural Networks
-
ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
-
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
-
TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?
-
Music Source Restoration with Ensemble Separation and Targeted Reconstruction
-
Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World
-
Modality-free Graph In-context Alignment
-
SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking
-
Show, Don't Tell: Detecting Novel Objects by Watching Human Videos
-
Taming the Long Tail: Efficient Item-wise Sharpness-Aware Minimization for LLM-based Recommender Systems
-
A Method for Learning Large-Scale Computational Construction Grammars from Semantically Annotated Corpora
-
AI Model Modulation with Logits Redistribution
-
FC-Track: Overlap-Aware Post-Association Correction for Online Multi-Object Tracking
-
SAP: Segment Any 4K Panorama
-
HIFICL: High-Fidelity In-Context Learning for Multimodal Tasks
MongoDB - Build AI That Scales
