Papers
-
How do LLMs Compute Verbal Confidence
-
Video Understanding: From Geometry and Semantics to Unified Models
-
Omni-3DEdit: Generalized Versatile 3D Editing in One-Pass
-
Revisiting foundation models for cell instance segmentation
-
Physics-Aware Machine Learning for Seismic and Volcanic Signal Interpretation
-
VISER: Visually-Informed System for Enhanced Robustness in Open-Set Iris Presentation Attack Detection
-
Procedural Generation of Algorithm Discovery Tasks in Machine Learning
-
RHYME-XT: A Neural Operator for Spatiotemporal Control Systems
-
Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval
-
Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs
-
Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?
-
Differential Attention-Augmented BiomedCLIP with Asymmetric Focal Optimization for Imbalanced Multi-Label Video Capsule Endoscopy Classification
-
DebugLM: Learning Traceable Training Data Provenance for LLMs
-
AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability
-
Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation
-
MAED: Mathematical Activation Error Detection for Mitigating Physical Fault Attacks in DNN Inference
-
RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference
-
scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns
-
A Creative Agent is Worth a 64-Token Template
-
A Noise Sensitivity Exponent Controls Large Statistical-to-Computational Gaps in Single- and Multi-Index Models
-
Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs
-
Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows
-
Understanding Task Aggregation for Generalizable Ultrasound Foundation Models
-
SpiderCam: Low-Power Snapshot Depth from Differential Defocus
-
Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages
-
Noise-Aware Misclassification Attack Detection in Collaborative DNN Inference
-
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
-
Only relative ranks matter in weight-clustered large language models
-
SegFly: A 2D-3D-2D Paradigm for Aerial RGB-Thermal Semantic Segmentation at Scale
-
Multi-Armed Sequential Hypothesis Testing by Betting
-
A practical artificial intelligence framework for legal age estimation using clavicle computed tomography scans
-
Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning
-
Evaluating FrameNet-Based Semantic Modeling for Gender-Based Violence Detection in Clinical Records
-
Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing
-
TransText: Alpha-as-RGB Representation for Transparent Text Animation
-
ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws
-
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
-
Unified Policy Value Decomposition for Rapid Adaptation
-
VideoAtlas: Navigating Long-Form Video in Logarithmic Compute
-
Gender Disambiguation in Machine Translation: Diagnostic Evaluation in Decoder-Only Architectures
-
ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation
-
LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition
-
Robust-ComBat: Mitigating Outlier Effects in Diffusion MRI Data Harmonization
-
Specification-Aware Distribution Shaping for Robotics Foundation Models
-
Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training
-
TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis
-
Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection
-
AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors
-
AdaRadar: Rate Adaptive Spectral Compression for Radar-based Perception
-
Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding
MongoDB - Build AI That Scales
