Papers
-
Object Pose Transformer: Unifying Unseen Object Pose Estimation
-
Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions
-
ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment
-
FG-Portrait: 3D Flow Guided Editable Portrait Animation
-
From Feature Learning to Spectral Basis Learning: A Unifying and Flexible Framework for Efficient and Robust Shape Matching
-
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM
-
LineMVGNN: Anti-Money Laundering with Line-Graph-Assisted Multi-View Graph Neural Networks
-
Harnessing Lightweight Transformer with Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation
-
Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation
-
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
-
Planning over MAPF Agent Dependencies via Multi-Dependency PIBT
-
Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies
-
GeoSANE: Learning Geospatial Representations from Models, Not Data
-
I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation
-
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
-
Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback
-
Bilevel Autoresearch: Meta-Autoresearching Itself
-
Mecha-nudges for Machines
-
Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning
-
Targeted Adversarial Traffic Generation : Black-box Approach to Evade Intrusion Detection Systems in IoT Networks
-
SIGMA: A Physics-Based Benchmark for Gas Chimney Understanding in Seismic Images
-
Evaluating LLM-Based Test Generation Under Software Evolution
-
3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
-
Code Review Agent Benchmark
-
DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection
-
CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection
-
End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions
-
RealMaster: Lifting Rendered Scenes into Photorealistic Video
-
InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting
-
Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
-
UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation
-
VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs
-
ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains
-
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
-
Failure of contextual invariance in gender inference with large language models
-
TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation
-
One View Is Enough! Monocular Training for In-the-Wild Novel View Generation
-
AgentRVOS: Reasoning over Object Tracks for Zero-Shot Referring Video Object Segmentation
-
Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
-
VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions
-
Estimating Flow Velocity and Vehicle Angle-of-Attack from Non-invasive Piezoelectric Structural Measurements Using Deep Learning
-
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
-
DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models
-
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation
-
MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage
-
OccAny: Generalized Unconstrained Urban 3D Occupancy
-
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
-
Environment Maps: Structured Environmental Representations for Long-Horizon Agents
-
LLMORPH: Automated Metamorphic Testing of Large Language Models
-
LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops
KiloClaw - Managed 🦀 