Papers
-
ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback
-
Gemma 3 Technical Report
-
Debunking the CUDA Myth Towards GPU-based AI Systems
-
The Amazon Nova Family of Models: Technical Report and Model Card
-
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
-
Long Context Tuning for Video Generation
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
-
Learning to Search Effective Example Sequences for In-Context Learning
-
Gemini Embedding: Generalizable Embeddings from Gemini
-
Aya Vision: Expanding the worlds AI can see
-
OWLViz: An Open-World Benchmark for Visual Question Answering
-
Towards Statistical Factuality Guarantee for Large Vision-Language Models
-
Evaluating Nova 2.0 Lite model under Amazon’s Frontier Model Safety Framework
-
AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools
-
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
-
Muon is Scalable for LLM Training
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
-
AToken: A Unified Tokenizer for Vision
-
Qwen2.5-VL Technical Report
-
MoBA: Mixture of Block Attention for Long-Context LLMs
-
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
-
GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units
-
Reviving The Classics: Active Reward Modeling in Large Language Model Alignment
-
s1: Simple test-time scalingContextual AI / Allen Institute for Artificial Intelligence, Stanford University, University of Washington
-
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
-
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
-
EmbeddingGemma: Powerful and Lightweight Text Representations
-
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
-
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
-
PoAct: Policy and Action Dual-Control Agent for Generalized Applications
-
Agent Laboratory: Using LLM Agents as Research Assistants
-
Retrieval-Augmented Generation with Graphs (GraphRAG)
-
Cosmos World Foundation Model Platform for Physical AI
-
Titans: Learning to Memorize at Test Time
-
Generative Video Propagation
-
In Case You Missed It: ARC 'Challenge' Is Not That Challenging
-
Qwen2.5 Technical Report
-
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
-
Alignment faking in large language models
-
How Often are Fingerprints Repeated in the Population? Expanding on Evidence from AI With the Birthday ParadoxUniversity of Pennsylvania Department of Criminology and Statistics, University of Pennsylvania School of Engineering and Applied Sciences
-
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
-
VDB-GPDF: Online Gaussian Process Distance Field with VDB Structure
-
pfl-research: simulation framework for accelerating research in Private Federated Learning
-
Frontier AI systems have surpassed the self-replicating red lineFudan University
-
InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention
-
Best-of-N Jailbreaking
-
Creating realistic 3D shapes using generative AIMassachusetts Institute of Technology
-
Commit0: Library Generation from Scratch
