Papers
-
Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions
-
DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation
-
Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive WeightingTencent / City University of Hong Kong, Hong Kong University of Science and Technology, University of Chicago
-
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use
-
M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering
-
This Looks Distinctly Like That: Grounding Interpretable Recognition in Stiefel Geometry against Neural Collapse
-
SPIRAL: A Closed-Loop Framework for Self-Improving Action World Models via Reflective Planning AgentsTencent / Chinese Academy of Sciences, Nanyang Technological University, National University of Singapore, Shanghai AI Lab, Zhejiang University
-
Evaluating Generative Models via One-Dimensional Code Distributions
-
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation
-
Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
-
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling
-
Stem: Rethinking Causal Information Flow in Sparse Attention
-
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
-
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
-
OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation
-
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video EditingTencent / Nanjing University, The University of Hong Kong, University of Chinese Academy of Sciences
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
-
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memorie
-
AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
-
The Art of Efficient Reasoning: Data, Reward, and Optimization
-
Haitao Lin
-
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
-
GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
-
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality AttentionTencent / Hunan University, National University of Singapore, The Chinese University of Hong Kong, Tsinghua University, Xi'an Jiaotong University
-
Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives
-
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
-
RISE-Video: Can Video Generators Decode Implicit World Rules?
-
BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
-
ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution
-
HY3D-Bench: Generation of 3D Assets
-
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
-
HunyuanImage 3.0 Technical Report
-
MAIN-VLA: Modeling Abstraction of Intention and eNvironment for Vision-Language-Action Models
-
AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model AlignmentTencent / East China University of Science and Technology, Hong Kong University of Science and Technology, Shenzhen University
-
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
-
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding
-
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
-
RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering
-
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
-
UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos
-
Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation
-
One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection
-
DocDancer: Towards Agentic Document-Grounded Information Seeking
-
Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing
-
FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning
-
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
-
A Versatile Multimodal Agent for Multimedia Content Generation
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
-
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection
-
HY-MT1.5 Technical Report
MongoDB - Build AI That Scales
