Performance Engineer
Anthropic
Hybrid
San Francisco, CA, USA
Full-time
$315,000 -
$625,000
About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About the Role
Running machine learning (ML) algorithms at our scale often requires solving novel systems problems. As a Performance Engineer, you'll be responsible for identifying these problems, and then developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates here will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML also.
Qualifications
Have significant software engineering or machine learning experience, particularly at supercomputing scale
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work
High performance, large-scale ML systems
GPU/Accelerator programming
ML framework internals
OS internals
Language modeling with transformers
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work
High performance, large-scale ML systems
GPU/Accelerator programming
ML framework internals
OS internals
Language modeling with transformers
Responsibilities
Implement low-latency high-throughput sampling for large language models
Implement GPU kernels to adapt our models to low-precision inference
Write a custom load-balancing algorithm to optimize serving efficiency
Build quantitative models of system performance
Design and implement a fault-tolerant distributed system running with a complex network topology
Debug kernel-level network latency spikes in a containerized environment
Implement GPU kernels to adapt our models to low-precision inference
Write a custom load-balancing algorithm to optimize serving efficiency
Build quantitative models of system performance
Design and implement a fault-tolerant distributed system running with a complex network topology
Debug kernel-level network latency spikes in a containerized environment

