Software Engineer (Site Reliability Engineer)

Anyscale

Anyscale commercializes Ray, an open-source project creating an ecosystem of libraries for scalable machine learning, making distributed computing accessible to developers.

San Francisco, CA, USA • Palo Alto, CA, USA

$180,600 - $200,900

Site Reliability

Senior Software Engineer

Hybrid

3+ years of experience

AI · Enterprise SaaS

This job posting is no longer active. Check out these related jobs instead:

Job Description

Anyscale, backed by prominent investors with $250+ million in funding, is revolutionizing distributed computing through Ray, their open-source project. They're building a platform that allows developers and data scientists to scale ML applications from laptop to cluster without deep distributed systems expertise.

As a Site Reliability Engineer at Anyscale, you'll be instrumental in maintaining the reliability and performance of their production systems and user-facing services. The role combines engineering excellence with operational expertise, focusing on building robust systems for monitoring, observability, and deployment automation.

Key responsibilities include developing a comprehensive view of cloud component utilization, implementing effective deployment methodologies, and building sophisticated monitoring and alerting systems. You'll also be responsible for establishing testing infrastructure and defining organization-wide SLOs.

The position offers an attractive compensation package ranging from $180.6K to $200.9K, complemented by equity and comprehensive benefits including healthcare, 401k, and various stipends. The hybrid work environment in either San Francisco or Palo Alto provides flexibility while maintaining collaborative opportunities.

This role is perfect for experienced SREs who want to work at the intersection of distributed systems and ML infrastructure, helping shape the future of AI application deployment. You'll be joining a company that powers the ML infrastructure of major tech companies like OpenAI, Uber, and Spotify, making a significant impact on the AI ecosystem.

The ideal candidate should have at least 3 years of relevant experience and a passion for building reliable, scalable systems. Anyscale values diversity and inclusion, welcoming applications from all backgrounds and providing equal opportunities for growth and success.

Last updated 3 days ago

Responsibilities For Software Engineer (Site Reliability Engineer)

Develop unified perspective on cloud component utilization across company
Ensure deployment methodologies align with reliability goals
Build systems for production environment understanding and observability
Create monitoring and alerting systems at different levels
Establish testing infrastructure
Develop tools for measuring service level objectives (SLOs)
Implement best practices and on-call systems
Coordinate creation and deployment of cloud-based services

Requirements For Software Engineer (Site Reliability Engineer)

At least 3 years of relevant work experience in a similar role

Benefits For Software Engineer (Site Reliability Engineer)

Equity

Medical Insurance

401k

Education Budget

Parental Leave

Commuter Benefits

Stock Options
Healthcare plans with 99% premium coverage
401k Retirement Plan
Wellness stipend
Education stipend
Paid Parental Leave
Flexible Time Off
Commute reimbursement
100% of in office meals covered