TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Papers

Filter by company
  • Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset
    Anthropic / Northeastern University, China
    Published on: 2026-02-09 1 author
  • The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
    Anthropic / University of Edinburgh
    Published on: 2026-01-30
  • Unsupervised decoding of encoded reasoning using language model interpretability
    Published on: 2025-12-06 1 author
  • Anthropic Economic Index report: Uneven geographic and enterprise AI adoption
    Published on: 2025-11-19 1 author
  • Signs of introspection in large language models
    Published on: 2025-10-29 1 author
  • Evaluating honesty and lie detection techniques on a diverse suite of dishonest models
    Published on: 2025-10-25 1 author
  • SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Age
    Anthropic / Redwood Research
    Published on: 2025-07-08 1 author
  • On the Biology of a Large Language Model
    Published on: 2025-03-27 1 author
  • Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
    Anthropic / Safeguards Research Team
    Published on: 2025-01-31 1 author
  • Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
    Anthropic / Safeguards Research Team
    Published on: 2025-01-31 1 author
  • Alignment faking in large language models
    Anthropic / New York University
    Published on: 2024-12-18 1 author
  • Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
    Anthropic / University of Oxford
    Published on: 2024-06-29 1 author
  • Claude 3.5 Sonnet Model Card Addendum
    Published on: 2024-06-20 1 author
  • Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
    Published on: 2024-05-21 1 author
  • The Claude 3 Model Family: Opus, Sonnet, and Haiku
    Published on: 2024-03-04 1 author
  • Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
    Published on: 2024-01-17 1 author
  • Constitutional AI: Harmlessness from AI Feedback
    Published on: 2022-12-15 1 author
  • Toy Models of Superposition
    Anthropic / Harvard University
    Published on: 2022-09-14 1 author
  • Softmax Linear Units
    Published on: 2022-06-27 1 author
  • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
    Published on: 2022-04-12 1 author
  • A Mathematical Framework for Transformer Circuits
    Published on: 2021-12-22 1 author
0 AIs selected
Clear selection
#
Name
Task