Papers
-
Long Context Tuning for Video Generation
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
-
Learning to Search Effective Example Sequences for In-Context Learning
-
Gemini Embedding: Generalizable Embeddings from Gemini
-
Aya Vision: Expanding the worlds AI can see
-
OWLViz: An Open-World Benchmark for Visual Question Answering
-
Towards Statistical Factuality Guarantee for Large Vision-Language Models
-
Evaluating Nova 2.0 Lite model under Amazon’s Frontier Model Safety Framework
-
AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools
-
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
-
Muon is Scalable for LLM Training
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
-
AToken: A Unified Tokenizer for Vision
-
Qwen2.5-VL Technical Report
-
MoBA: Mixture of Block Attention for Long-Context LLMs
-
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
-
GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units
-
Reviving The Classics: Active Reward Modeling in Large Language Model Alignment
-
s1: Simple test-time scaling
-
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
-
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
-
EmbeddingGemma: Powerful and Lightweight Text Representations
-
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
-
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
-
PoAct: Policy and Action Dual-Control Agent for Generalized Applications
-
Agent Laboratory: Using LLM Agents as Research Assistants
-
Retrieval-Augmented Generation with Graphs (GraphRAG)
-
Cosmos World Foundation Model Platform for Physical AI
-
Titans: Learning to Memorize at Test Time
-
Generative Video Propagation
-
In Case You Missed It: ARC 'Challenge' Is Not That Challenging
-
Qwen2.5 Technical Report
-
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
-
Alignment faking in large language models
-
How Often are Fingerprints Repeated in the Population? Expanding on Evidence from AI With the Birthday Paradox
-
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
-
VDB-GPDF: Online Gaussian Process Distance Field with VDB Structure
-
pfl-research: simulation framework for accelerating research in Private Federated Learning
-
Frontier AI systems have surpassed the self-replicating red line
-
InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention
-
Best-of-N Jailbreaking
-
Creating realistic 3D shapes using generative AIMassachusetts Institute of Technology
-
Commit0: Library Generation from Scratch
-
ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models
-
Controlling Language and Diffusion Models by Transporting Activations
-
The Rise and Potential of Large Language Model Based Agents: A SurveyMIT
-
Evaluating Cultural and Social Awareness of LLM Web Agents
-
SF-V: Single Forward Video Generation Model
