Anthropic is seeking an experienced Staff Software Engineer to join their AI Reliability Engineering team. This role is crucial for ensuring the reliability and performance of Anthropic's AI systems, both internal and customer-facing. The position combines traditional site reliability engineering with the unique challenges of AI infrastructure.
The role involves developing and maintaining service level objectives for large language model systems, implementing comprehensive monitoring solutions, and building high-availability infrastructure capable of serving millions of customers. You'll be responsible for creating automated failover systems across multiple regions and cloud providers, leading incident response for critical AI services, and optimizing costs for large-scale AI infrastructure.
The ideal candidate brings extensive experience in distributed systems observability and monitoring at scale, with a deep understanding of AI infrastructure operations. You should be comfortable working with both traditional infrastructure metrics and AI-specific performance indicators. Experience with chaos engineering, resilience testing, and maintaining SLO/SLA frameworks is essential.
Anthropic offers a competitive compensation package ranging from $320,000 to $485,000 USD, along with benefits including equity donation matching, generous vacation and parental leave, and flexible working hours. The position is hybrid, requiring at least 25% time in one of their offices in San Francisco, New York City, or Seattle.
The company is committed to developing safe and beneficial AI systems, working as a cohesive team on large-scale research efforts. They value impact-focused work and view AI research as an empirical science. The collaborative environment includes frequent research discussions and emphasizes effective communication skills.
This is an opportunity to play a crucial role in ensuring the reliability and safety of cutting-edge AI systems while working with a team dedicated to beneficial AI development. The position offers the chance to work on unprecedented technical challenges while contributing to Anthropic's mission of creating reliable, interpretable, and steerable AI systems.