Papers
-
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
-
PixelFlow: Pixel-Space Generative Models with Flow
-
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
-
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
-
Gemini: A Family of Highly Capable Multimodal Models
-
SmolVLM: Redefining small and efficient multimodal models
-
One-Minute Video Generation with Test-Time Training
-
Data Scaling Laws for End-to-End Autonomous Driving
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
-
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
-
A Systematic Survey of Automatic Prompt Optimization Techniques
-
Scaling Language-Free Visual Representation Learning
-
Large Language Models Pass the Turing Test
-
XAMBA: SSMs on Edge NPUs
-
On the Biology of a Large Language Model
-
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration
-
Qwen2.5-Omni Technical Report
-
Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2
-
ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback
-
Gemma 3 Technical Report
-
Debunking the CUDA Myth Towards GPU-based AI Systems
-
The Amazon Nova Family of Models: Technical Report and Model Card
-
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
-
Long Context Tuning for Video Generation
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
-
Learning to Search Effective Example Sequences for In-Context Learning
-
Gemini Embedding: Generalizable Embeddings from Gemini
-
Aya Vision: Expanding the worlds AI can see
-
OWLViz: An Open-World Benchmark for Visual Question Answering
-
Towards Statistical Factuality Guarantee for Large Vision-Language Models
-
Evaluating Nova 2.0 Lite model under Amazon’s Frontier Model Safety Framework
-
AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools
-
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
-
Muon is Scalable for LLM Training
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
-
AToken: A Unified Tokenizer for Vision
-
Qwen2.5-VL Technical Report
-
MoBA: Mixture of Block Attention for Long-Context LLMs
-
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
-
GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units
-
Reviving The Classics: Active Reward Modeling in Large Language Model Alignment
-
s1: Simple test-time scaling
-
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
-
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
-
EmbeddingGemma: Powerful and Lightweight Text Representations
-
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
-
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
-
PoAct: Policy and Action Dual-Control Agent for Generalized Applications
