Papers
-
Compositional Planning with Jumpy World Models
-
Wink: Recovering from Misbehaviors in Coding Agents
-
SARAH: Spatially Aware Real-time Agentic Humans
-
Image Generation with a Sphere Encoder
-
Learning to Reason in 13 Parameters
-
An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence
-
ReasonCACHE: Teaching LLMs To Reason Without Weight Updates
-
Agentic Very Long Video Understanding
-
Unified Text-Image Generation with Weakness-Targeted Post-Training
-
Learning Latent Action World Models In The Wild
-
Agentic Reasoning for Large Language Models
-
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
-
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
-
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
-
GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluatio
-
World Models Can Leverage Human Videos for Dexterous Manipulation
-
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
-
UMA: A Family of Universal Models for Atoms
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
-
The Llama 3 Herd of Models
-
The Llama 3 Herd of Model
-
DINOv2: Learning Robust Visual Features without Supervision
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
-
IMAGEBIND: One Embedding Space To Bind Them Al
-
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
-
Segment Anything
-
LLaMA: Open and Efficient Foundation Language Models
-
Toolformer: Language Models Can Teach Themselves to Use Tools
-
Flow Matching for Generative Modeling
-
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
-
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
-
No Language Left Behind: Scaling Human-Centered Machine TranslationMarta R. Costa-jussà
-
OPT: Open Pre-trained Transformer Language Models
-
PyTorch: An Imperative Style, High-Performance Deep Learning Library
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
-
Mask R-CNN
-
Billion-scale similarity search with GPUs
-
Bag of Tricks for Efficient Text Classification
-
Performance of Large Language Models in Answering Critical Care Medicine Questions
