Papers
-
D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
-
Streaming Video Instruction Tuning
-
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
-
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
-
AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path
-
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation
-
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10
-
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training
-
Distribution Matching Variational AutoEncoder
-
HunyuanVideo 1.5 Technical Report
-
Training-Free Group Relative Policy Optimization
-
Training-Free Group Relative Policy Optimization
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
-
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View SynthesisTencent / Hong Kong University of Science and Technology, Monash University, Peking University, The Chinese University of Hong Kong
-
Reinforced Curriculum Pre-Alignment for Domain-Adaptive VLMs
MongoDB - Build AI That Scales
