Papers
-
Command A: An Enterprise-Ready Large Language Model
-
InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction Features
-
Investigating the Overlooked Hessian Structure: From CNNs to LLMs
-
The Leaderboard Illusion
-
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
-
Perception Encoder: The best visual embeddings are not at the output of the network
-
Kimi-Audio Technical Report
-
Describe Anything: Detailed Localized Image and Video Captioning
-
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
-
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
-
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-
How Does Critical Batch Size Scale in Pre-training?
-
Representation Engineering for Large-Language Models: Survey and Research Challenges
-
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
-
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal ModelsSenseTime / Fudan University, Nanjing University, Shanghai Jiao Tong University, The Chinese University of Hong Kong, Tsinghua University
-
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
-
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
-
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
-
How new data permeates LLM knowledge and how to dilute it
-
Migrating Code At Scale With LLMs At Google
-
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
-
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
-
PixelFlow: Pixel-Space Generative Models with Flow
-
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
-
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
-
Gemini: A Family of Highly Capable Multimodal Models
-
SmolVLM: Redefining small and efficient multimodal models
-
One-Minute Video Generation with Test-Time Training
-
Data Scaling Laws for End-to-End Autonomous Driving
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
-
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
-
A Systematic Survey of Automatic Prompt Optimization Techniques
-
Scaling Language-Free Visual Representation Learning
-
Large Language Models Pass the Turing Test
-
XAMBA: SSMs on Edge NPUs
-
On the Biology of a Large Language Model
-
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration
-
Qwen2.5-Omni Technical Report
-
Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2
-
ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback
-
Gemma 3 Technical Report
-
Debunking the CUDA Myth Towards GPU-based AI Systems
-
The Amazon Nova Family of Models: Technical Report and Model Card
-
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
-
Long Context Tuning for Video Generation
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
-
Learning to Search Effective Example Sequences for In-Context Learning
-
Gemini Embedding: Generalizable Embeddings from Gemini
