Papers
-
Pose-Free Omnidirectional Gaussian Splatting for 360-Degree Videos with Consistent Depth Priors
-
ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images
-
Strain-Parameterized Coupled Dynamics and Dual-Camera Visual Servoing for Aerial Continuum Manipulators
-
Edge Radar Material Classification Under Geometry Shifts
-
An Explainable AI-Driven Framework for Automated Brain Tumor Segmentation Using an Attention-Enhanced U-Net
-
FHAvatar: Fast and High-Fidelity Reconstruction of Face-and-Hair Composable 3D Head Avatar from Few Casual Captures
-
RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue
-
Off-Policy Value-Based Reinforcement Learning for Large Language Models
-
Contrastive Metric Learning for Point Cloud Segmentation in Highly Granular Detectors
-
Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein
-
Object Pose Transformer: Unifying Unseen Object Pose Estimation
-
Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions
-
ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment
-
FG-Portrait: 3D Flow Guided Editable Portrait Animation
-
From Feature Learning to Spectral Basis Learning: A Unifying and Flexible Framework for Efficient and Robust Shape Matching
-
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM
-
LineMVGNN: Anti-Money Laundering with Line-Graph-Assisted Multi-View Graph Neural Networks
-
Harnessing Lightweight Transformer with Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation
-
Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation
-
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
-
Planning over MAPF Agent Dependencies via Multi-Dependency PIBT
-
Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies
-
GeoSANE: Learning Geospatial Representations from Models, Not Data
-
I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation
-
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
-
Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback
-
Bilevel Autoresearch: Meta-Autoresearching Itself
-
Mecha-nudges for Machines
-
Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning
-
Targeted Adversarial Traffic Generation : Black-box Approach to Evade Intrusion Detection Systems in IoT Networks
-
SIGMA: A Physics-Based Benchmark for Gas Chimney Understanding in Seismic Images
-
Evaluating LLM-Based Test Generation Under Software Evolution
-
3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
-
Code Review Agent Benchmark
-
DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection
-
CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection
-
End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions
-
RealMaster: Lifting Rendered Scenes into Photorealistic Video
-
InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting
-
Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
-
UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation
-
VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs
-
ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains
-
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
-
Failure of contextual invariance in gender inference with large language models
-
TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation
-
One View Is Enough! Monocular Training for In-the-Wild Novel View Generation
-
AgentRVOS: Reasoning over Object Tracks for Zero-Shot Referring Video Object Segmentation
-
Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
-
VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions
MongoDB - Build AI That Scales
