Taro Logo

Staff Software Engineer, AI Reliability Engineering

Anthropic creates reliable, interpretable, and steerable AI systems, focusing on safe and beneficial AI development.
$320,000 - $485,000
Staff Software Engineer
Hybrid
501 - 1,000 Employees
8+ years of experience
AI
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Staff Software Engineer, AI Reliability Engineering

Anthropic is seeking an experienced Staff Software Engineer to join their AI Reliability Engineering team. This role is crucial for ensuring the reliability and performance of Anthropic's AI systems, both internal and customer-facing. The position combines traditional site reliability engineering with the unique challenges of AI infrastructure.

The role involves developing and maintaining service level objectives for large language model systems, implementing comprehensive monitoring solutions, and building high-availability infrastructure capable of serving millions of customers. You'll be responsible for creating automated failover systems across multiple regions and cloud providers, leading incident response for critical AI services, and optimizing costs for large-scale AI infrastructure.

The ideal candidate brings extensive experience in distributed systems observability and monitoring at scale, with a deep understanding of AI infrastructure operations. You should be comfortable working with both traditional infrastructure metrics and AI-specific performance indicators. Experience with chaos engineering, resilience testing, and maintaining SLO/SLA frameworks is essential.

Anthropic offers a competitive compensation package ranging from $320,000 to $485,000 USD, along with benefits including equity donation matching, generous vacation and parental leave, and flexible working hours. The position is hybrid, requiring at least 25% time in one of their offices in San Francisco, New York City, or Seattle.

The company is committed to developing safe and beneficial AI systems, working as a cohesive team on large-scale research efforts. They value impact-focused work and view AI research as an empirical science. The collaborative environment includes frequent research discussions and emphasizes effective communication skills.

This is an opportunity to play a crucial role in ensuring the reliability and safety of cutting-edge AI systems while working with a team dedicated to beneficial AI development. The position offers the chance to work on unprecedented technical challenges while contributing to Anthropic's mission of creating reliable, interpretable, and steerable AI systems.

Last updated 2 months ago

Responsibilities For Staff Software Engineer, AI Reliability Engineering

  • Develop Service Level Objectives for large language model serving and training systems
  • Design and implement monitoring systems for availability, latency and other metrics
  • Design and implement high-availability language model serving infrastructure
  • Develop automated failover and recovery systems across multiple regions and cloud providers
  • Lead incident response for critical AI services
  • Build and maintain cost optimization systems for large-scale AI infrastructure

Requirements For Staff Software Engineer, AI Reliability Engineering

Kubernetes
Linux
  • Extensive experience with distributed systems observability and monitoring at scale
  • Understanding of AI infrastructure operations
  • Experience implementing and maintaining SLO/SLA frameworks
  • Comfort with traditional and AI-specific metrics
  • Experience with chaos engineering and resilience testing
  • Ability to bridge ML engineers and infrastructure teams
  • Excellent communication skills
  • Bachelor's degree in related field or equivalent experience

Benefits For Staff Software Engineer, AI Reliability Engineering

Visa Sponsorship
  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
  • Office space for collaboration
  • Visa sponsorship available

Interested in this job?