Papers
-
CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling
-
A Simple Baseline for Unifying Understanding, Generation, and Editing via Vanilla Next-token Prediction
-
How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1
-
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
-
LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts
-
ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training
-
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs
-
Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory
-
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
-
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
-
Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering
-
Efficient Context Scaling with LongCat ZigZag Attention
-
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
-
Harvesting Efficient On-Demand Order Pooling from Skilled Couriers: Enhancing Graph Representation Learning for Refining Real-time Many-to-One Assignments
-
Evaluating Object Hallucination in Large Vision-Language Models
-
Decision-Making Context Interaction Network for Click-Through Rate Prediction
-
Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction
