Taro Logo

Senior Site Reliability Engineer

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins to transform industries.
Santa Clara, CA, USAWestford, MA 01886, USAAustin, TX, USA
$168,000 - $322,000
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
10+ years of experience
AI · Enterprise SaaS

Description For Senior Site Reliability Engineer

NVIDIA, the world leader in accelerated computing, is seeking a Senior Site Reliability Engineer to join their team. This role offers an exciting opportunity to shape the future of computing and ensure the smooth operation of cutting-edge technologies. The position involves working with AI and pioneering solutions that have significant global impact.

As an SRE at NVIDIA, you'll be responsible for maintaining and improving critical infrastructure across a globally distributed, multi-cloud hybrid environment including AWS, GCP, and on-premises systems. The role requires deep technical expertise in Kubernetes, cloud platforms, and modern DevOps practices, combined with strong coding abilities in languages like Python and Go.

The ideal candidate brings 10+ years of experience in building and supporting critical services, with particular emphasis on infrastructure automation, service reliability, and performance optimization. You'll work closely with cross-functional teams, own end-to-end solutions, and participate in on-call rotations to ensure maximum system uptime.

This position offers competitive compensation with a base salary range of $168,000 - $322,000 (depending on level), plus equity and comprehensive benefits. Multiple locations are available including Santa Clara, Westford, Austin, and Durham, providing flexibility for qualified candidates. NVIDIA's commitment to diversity and inclusion, combined with their position at the forefront of AI and accelerated computing, makes this an exceptional opportunity for experienced SRE professionals looking to make a significant impact.

Last updated 18 days ago

Responsibilities For Senior Site Reliability Engineer

  • Own and build solutions while collaborating with cross-functional teams
  • Improve solution provisioning and management through automation
  • Identify areas to improve service resiliency
  • Detect performance issues and recommend solutions
  • Conduct capacity management and planning
  • Participate in incident reviews and write RCA reports
  • Deliver SRE solutions in multi-cloud hybrid environment
  • Ensure high uptime and QoS for internal customers
  • Participate in on-call rotation

Requirements For Senior Site Reliability Engineer

Kubernetes
Python
Go
Linux
  • B.S. degree in Computer Science or related technical field with 10+ years experience
  • Proficiency in Kubernetes administration and Infrastructure as Code
  • Deep understanding of Linux operating systems and TCP/IP
  • Expertise with major cloud service provider (AWS, GCP, Azure)
  • Demonstrated proficiency with end-to-end SRE capabilities
  • Proficient in monitoring, metrics gathering, APM, and log collection
  • 5+ years coding experience in Python, Go, Ruby, or Groovy
  • Creative problem solver with excellent debugging and communication skills

Benefits For Senior Site Reliability Engineer

Equity
  • Equity
  • Benefits package available

Jobs Related To NVIDIA Senior Site Reliability Engineer