Taro Logo

Site Reliability Engineer (L4/L5)

Netflix is one of the world's leading entertainment services with over 300 million paid memberships in 190+ countries.
Site Reliability
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
Enterprise SaaS · Entertainment

Description For Site Reliability Engineer (L4/L5)

Netflix, a global entertainment leader serving over 300 million subscribers worldwide, is seeking a Site Reliability Engineer to join their N-Tech SRE team. This role focuses on enhancing the reliability and resilience of Netflix's internal services used by employees daily. As an SRE, you'll work on implementing best practices, automation, and proactive measures to maintain system reliability while reducing manual operations through sophisticated tooling.

The position offers an exciting opportunity to work with complex distributed systems at scale, requiring both technical expertise and a deep understanding of how systems fail. You'll collaborate with cross-functional teams to integrate observability, reliability, and security considerations throughout the software development lifecycle. The role involves hands-on work with modern technologies including cloud platforms, containerization, and infrastructure as code.

This is an ideal position for engineers who are passionate about solving complex technical challenges and have experience with large-scale systems. The role requires both technical depth and excellent communication skills, as you'll be working closely with product teams to improve system reliability and implement robust incident response frameworks.

Netflix offers a unique culture that values inclusion, innovation, and continuous learning. The company provides an environment where you can make a significant impact on the reliability of systems that support Netflix's global operations. If you're excited about working with cutting-edge technology, solving complex problems, and being part of a team that values both technical excellence and human factors in engineering, this role offers an exceptional opportunity to grow your career at one of the world's leading technology companies.

Last updated 3 days ago

Responsibilities For Site Reliability Engineer (L4/L5)

  • Design, implement, and maintain scalable and reliable infrastructure
  • Collaborate with engineering and product teams on observability, reliability, and security
  • Develop and implement automation tools for monitoring, deployment, and incident response
  • Conduct capacity planning, performance analysis, and system tuning
  • Participate in on-call rotations and incident response
  • Implement and improve monitoring and alerting systems
  • Implement and maintain disaster recovery and business continuity plans
  • Evaluate and recommend improvements for system observability and reliability
  • Identify sources of instability in distributed systems
  • Engage with product teams to diagnose operational issues
  • Implement and maintain incident response framework
  • Champion growth mindset and continuous learning culture

Requirements For Site Reliability Engineer (L4/L5)

Python
Go
Java
JavaScript
Node.js
Kubernetes
  • 3+ years of experience as a Site Reliability Engineer or similar role
  • Strong scripting and programming skills (Python, Go, Java or JavaScript/Node.js)
  • Experience with complex sociotechnical systems and operations at scale
  • Experience with incident management and response
  • Experience with Infrastructure as code like Terraform and container orchestration tools
  • Experience with cloud platforms like AWS, microservices architecture
  • Excellent communication & collaboration skills
  • Proven ability to cultivate relationships through influence
  • Proven ability to troubleshoot complex issues
  • Familiarity with Human Factors Engineering
  • Ability to grow expertise, influence & educate others

Jobs Related To Netflix Site Reliability Engineer (L4/L5)