Papers
-
EVMbench: Evaluating AI Agents on Smart Contract Security
-
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
-
Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis
-
Addendum to GPT-5.2 System Card: GPT-5.2-Codex
-
Monitoring Monitorability
-
Evaluating AI’s ability to perform scientific research tasks
-
On Learning-Curve Monotonicity for Maximum Likelihood Estimators
-
Training LLMs for Honesty via Confessions
-
Early Science Acceleration Experiments with GPT-5
-
Weight-Sparse Transformers Have Interpretable Circuits
-
GPT-4 Technical Report
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
-
Robust Speech Recognition via Large-Scale Weak Supervision
-
Hierarchical Text-Conditional Image Generation with CLIP Latents
-
Training Language Models to Follow Instructions with Human Feedback
-
Training Verifiers to Solve Math Word Problems
-
Learning Transferable Visual Models From Natural Language Supervision
-
Learning Transferable Visual Models From Natural Language Supervision
-
Scaling Laws for Neural Language Models
-
Dota 2 with Large Scale Deep Reinforcement Learning
-
Improving Language Understanding by Generative Pre-Training (GPT-1)
-
Language Models are Unsupervised Multitask Learners
-
Proximal Policy Optimization Algorithms
