Papers
-
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
-
Point-to-Mask: From Arbitrary Point Annotations to Mask-Level Infrared Small Target Detection
-
Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus
-
Human/AI Collective Intelligence for Deliberative Democracy: A Human-Centred Design Approach
-
AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection
-
Behavior-Centric Extraction of Scenarios from Highway Traffic Data and their Domain-Knowledge-Guided Clustering using CVQ-VAE
-
Adaptive Theory of Mind for LLM-based Multi-Agent Coordination
-
FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition
-
CineSRD: Leveraging Visual, Acoustic, and Linguistic Cues for Open-World Visual Media Speaker Diarization
-
VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment
-
MSRAMIE: Multimodal Structured Reasoning Agent for Multi-instruction Image Editing
-
Physics-integrated neural differentiable modeling for immersed boundary systems
-
Laya: A LeJEPA Approach to EEG via Latent Prediction over Reconstruction
-
Locate-then-Sparsify: Attribution Guided Sparse Strategy for Visual Hallucination Mitigation
-
Persistent Story World Simulation with Continuous Character Customization
-
Surrogate-Assisted Genetic Programming with Rank-Based Phenotypic Characterisation for Dynamic Multi-Mode Project Scheduling
-
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
-
Attention-guided Evidence Grounding for Spoken Question Answering
-
PyPhonPlan: Simulating phonetic planning with dynamic neural fields and task dynamics
-
Micro-AU CLIP: Fine-Grained Contrastive Learning from Local Independence to Global Dependency for Micro-Expression Action Unit Detection
-
DriveFix: Spatio-Temporally Coherent Driving Scene Restoration
-
NeSy-Route: A Neuro-Symbolic Benchmark for Constrained Route Planning in Remote Sensing
-
Omnilingual MT: Machine Translation for 1,600 Languages
-
Learning to Predict, Discover, and Reason in High-Dimensional Event Sequences
-
DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns
-
A Human-Centred Architecture for Large Language Models-Cognitive Assistants in Manufacturing within Quality Management Systems
-
An Interpretable Machine Learning Framework for Non-Small Cell Lung Cancer Drug Response Analysis
-
Decoding the Critique Mechanism in Large Reasoning Models
-
Behavioral Steering in a 35B MoE Language Model via SAE-Decoded Probe Vectors: One Agency Axis, Not Five Traits
-
SpikeCLR: Contrastive Self-Supervised Learning for Few-Shot Event-Based Vision using Spiking Neural Networks
-
Iris: Bringing Real-World Priors into Diffusion Model for Monocular Depth Estimation
-
PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection
-
Detecting Sentiment Steering Attacks on RAG-enabled Large Language Models
-
Learning Human-Object Interaction for 3D Human Pose Estimation from LiDAR Point Clouds
-
Explainable machine learning workflows for radio astronomical data processing
-
Automated identification of Ichneumonoidea wasps via YOLO-based deep learning: Integrating HiresCam for Explainable AI
-
PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development
-
Toward Experimentation-as-a-Service in 5G/6G: The Plaza6G Prototype for AI-Assisted Trials
-
$D^3$-RSMDE: 40$\times$ Faster and High-Fidelity Remote Sensing Monocular Depth Estimation
-
Advancing Visual Reliability: Color-Accurate Underwater Image Enhancement for Real-Time Underwater Missions
-
FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment
-
DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity
-
Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy
-
FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios
-
InViC: Intent-aware Visual Cues for Medical Visual Question Answering
-
Semantic One-Dimensional Tokenizer for Image Reconstruction and Generation
-
Continual Multimodal Egocentric Activity Recognition via Modality-Aware Novel Detection
-
Prior-Informed Neural Network Initialization: A Spectral Approach for Function Parameterizing Architectures
-
Age Predictors Through the Lens of Generalization, Bias Mitigation, and Interpretability: Reflections on Causal Implications
-
Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement
MongoDB - Build AI That Scales
