Papers
-
Agent Laboratory: Using LLM Agents as Research Assistants
-
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
-
AlphaEvolve: A coding agent for scientific and algorithmic discovery
-
Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences
-
Transformers without Normalization
-
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scalin
-
Ministral 3
-
The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs
-
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
-
Magistral
-
Adobe Researchers present a powerful, unified approach to generative video editing at CVPR 2025
-
Multi-Token Attention
-
Transaction Categorization with Relational Deep Learning in QuickBooks
-
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
-
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
-
Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation
-
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
-
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
-
Splat and Replace: 3D Reconstruction with Repetitive Elements
-
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
-
HybGRAG: Hybrid Retrieval-Augmented Generation on Textual and Relational Knowledge Bases
-
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
-
M+: Extending MemoryLLM with Scalable Long-Term Memory
-
Skywork Open Reasoner 1 Technical Report
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
-
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
-
Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach
-
Autoregressive Speech Synthesis without Vector Quantization
-
Vision as LoRA
-
syftr: Pareto-Optimal Generative AI
-
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
-
Gemini Robotics: Bringing AI into the Physical World
-
OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks
-
One RL to See Them All: Visual Triple Unified Reinforcement Learning
-
GiGL: Large-Scale Graph Neural Networks at Snapchat
-
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
-
Model Merging in Pre-training of Large Language Models
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
-
M-RewardBench: Evaluating Reward Models in Multilingual Settings
-
Lessons from Defending Gemini Against Indirect Prompt Injections
-
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
-
Progressive Autoregressive Video Diffusion Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
-
VGGT: Visual Geometry Grounded Transformer
-
Qwen3 Technical Report
-
The Leaderboard Illusion
-
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
-
LLMs Get Lost In Multi-Turn Conversation
-
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well
