Taro Logo

Software Engineer, Infrastructure

Profile picture

Anthropic

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.
London, UK
$225,000 - $390,000
Backend · Cloud · DevOps...
Staff Software Engineer
Hybrid
12+ years

Description

Anthropic is seeking talented and experienced Infrastructure Engineers to join our team and support the development, scaling, and maintenance of our cutting-edge AI systems. By joining our Infrastructure team, you will have the opportunity to work on groundbreaking AI technologies and contribute to the development of frontier models, supporting Anthropic's mission to create safe and reliable AI systems that benefit humanity.

We currently have openings in Data Infrastructure, Research Infrastructure, Site Reliability Engineering, Systems, and Observability teams. Each team plays a crucial role in ensuring the reliability, scalability, and efficiency of our AI systems.

As a Software Engineer in Infrastructure at Anthropic, you will be responsible for leading the build-out of industry-leading AI clusters, consulting with stakeholders to understand infrastructure needs, setting technical strategies, and mentoring top talent. You will work with cutting-edge technologies and cloud services to design and implement scalable solutions that support our AI research and products.

The ideal candidate will have 12+ years of relevant industry experience, with at least 3 years leading large-scale, complex projects or teams. You should be passionate about distributed systems at scale, infrastructure reliability, and continuous improvement. Strong proficiency in programming languages like Python, Rust, Go, or Java is required, along with deep knowledge of modern cloud infrastructure.

At Anthropic, we offer a competitive compensation package, including salary, equity, and comprehensive benefits. We provide a collaborative and innovative work environment where you can make a significant impact on the future of AI technology. Join us in our mission to create reliable, interpretable, and steerable AI systems that benefit humanity.

Last updated

Responsibilities

  • Lead build out of industry-leading AI clusters (thousands to hundreds of thousands of machines), partnering closely with cloud service providers on cluster build out and required features
  • Consult with different stakeholders to deeply understand infrastructure, data and compute needs, identifying potential solutions to support frontier research and product development
  • Set technical strategy and oversee development of high scale, reliable infrastructure systems
  • Mentor top technical talent
  • Design processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice

Requirements

Kubernetes
Python
Rust
Go
Java
  • 12+ years of relevant industry experience, 3+ years leading large scale, complex projects or teams as an engineer or tech lead
  • Obsessed with distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement
  • Strong proficiency in at least one programming language (e.g., Python, Rust, Go, Java)
  • Strong problem-solving skills and ability to work independently
  • Passion for supporting internal partners like research to understand their needs
  • Excellent communication skills to build consensus with stakeholders, both internally and externally
  • Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP

Benefits

  • Optional equity donation matching
  • Private health, dental, and vision insurance for you and your dependents
  • Pension contribution (matching 4% of your salary)
  • 21 weeks of paid parental leave
  • Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more
  • Health cash plan
  • Life insurance and income protection
  • Daily lunches and snacks in our office

Interested in this job?