Papers
-
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
-
Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training
-
TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance
-
Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs
-
Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?
-
R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning
-
No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models
-
AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation
-
Back to Basics: Revisiting ASR in the Age of Voice Agents
-
PixelSmile: Toward Fine-Grained Facial Expression Editing
-
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
-
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
-
SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding
-
Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
-
Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment
-
How good was my shot? Quantifying Player Skill Level in Table Tennis
-
MegaFlow: Zero-Shot Large Displacement Optical Flow
-
PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow
-
Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving
-
Vega: Learning to Drive with Natural Language Instructions
-
RefAlign: Representation Alignment for Reference-to-Video Generation
-
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models
-
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
-
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
-
Vision Transformers and Graph Neural Networks for Charged Particle Tracking in the ATLAS Muon Spectrometer
-
Beyond identifiability: Learning causal representations with few environments and finite samples
-
Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval
-
End-to-end Feature Alignment: A Simple CNN with Intrinsic Class Attribution
-
LEMON: a foundation model for nuclear morphology in Computational Pathology
-
Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment
-
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation
-
ExVerus: Verus Proof Repair via Counterexample Reasoning
-
MAGNET: Autonomous Expert Model Generation via Decentralized Autoresearch and BitNet Training
-
Geo$^\textbf{2}$: Geometry-Guided Cross-view Geo-Localization and Image Synthesis
-
Doctorina MedBench: End-to-End Evaluation of Agent-Based Medical AI
-
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?
-
Fus3D: Decoding Consolidated 3D Geometry from Feed-forward Geometry Transformer Latents
-
A Neural Score-Based Particle Method for the Vlasov-Maxwell-Landau System
-
Gradient-Informed Training for Low-Resource Multilingual Speech Translation
-
A Compression Perspective on Simplicity Bias
-
GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding
-
Self-Organized Optical Pathways in Optofluidic Photonic Crystals
-
Incorporating contextual information into KGWAS for interpretable GWAS discovery
-
In-Context Molecular Property Prediction with LLMs: A Blinding Study on Memorization and Knowledge Conflicts
-
On the Expressive Power of Contextual Relations in Transformers
-
Why Safety Probes Catch Liars But Miss Fanatics
-
Methods for Knowledge Graph Construction from Text Collections: Development and Applications
-
Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation
-
GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks
MongoDB - Build AI That Scales
