Papers
-
Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias
-
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
-
Robust Post-Training for Generative Recommenders: Why Exponential Reward-Weighted SFT Outperforms RLHF
-
Towards a Neural Debugger for Python
-
DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding
-
Offline Materials Optimization with CliqueFlowmer
-
CHMv2: Improvements in Global Canopy Height Mapping using DINOv3
-
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
-
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
-
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
-
InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
-
Agentic Code Reasoning
-
Compositional Planning with Jumpy World Models
-
Wink: Recovering from Misbehaviors in Coding Agents
-
SARAH: Spatially Aware Real-time Agentic Humans
-
Image Generation with a Sphere Encoder
-
Learning to Reason in 13 Parameters
-
An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence
-
ReasonCACHE: Teaching LLMs To Reason Without Weight Updates
-
Agentic Very Long Video Understanding
-
Unified Text-Image Generation with Weakness-Targeted Post-Training
-
Learning Latent Action World Models In The WildMeta Platforms / National Institute for Research in Digital Science and Technology, New York University
-
Agentic Reasoning for Large Language Models
-
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
-
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
-
Diffusion Forcing for Multi-Agent Interaction Sequence Modeling
-
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
-
GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluatio
-
World Models Can Leverage Human Videos for Dexterous Manipulation
-
Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases
-
TUNA: Taming Unified Visual Representations for Native Unified Multimodal ModelsMeta Platforms / King Abdullah University of Science and Technology (KAUST), The University of Hong Kong, University of Waterloo
-
UMA: A Family of Universal Models for Atoms
-
Transformers without Normalization
-
Multi-Token Attention
-
VGGT: Visual Geometry Grounded Transformer
-
Perception Encoder: The best visual embeddings are not at the output of the network
-
Scaling Language-Free Visual Representation Learning
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
-
The Llama 3 Herd of Model
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
-
DINOv2: Learning Robust Visual Features without Supervision
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
-
IMAGEBIND: One Embedding Space To Bind Them Al
-
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
-
Segment Anything
-
LLaMA: Open and Efficient Foundation Language Models
-
Toolformer: Language Models Can Teach Themselves to Use Tools
-
Flow Matching for Generative Modeling
-
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
MongoDB - Build AI That Scales
