NVIDIA, the world leader in accelerated computing, is seeking a Senior Site Reliability Engineer to join their team. This role offers an exciting opportunity to shape the future of computing and ensure the smooth operation of cutting-edge technologies. The position involves working with AI and pioneering solutions that have significant global impact.
As an SRE at NVIDIA, you'll be responsible for maintaining and improving critical infrastructure across a globally distributed, multi-cloud hybrid environment including AWS, GCP, and on-premises systems. The role requires deep technical expertise in Kubernetes, cloud platforms, and modern DevOps practices, combined with strong coding abilities in languages like Python and Go.
The ideal candidate brings 10+ years of experience in building and supporting critical services, with particular emphasis on infrastructure automation, service reliability, and performance optimization. You'll work closely with cross-functional teams, own end-to-end solutions, and participate in on-call rotations to ensure maximum system uptime.
This position offers competitive compensation with a base salary range of $168,000 - $322,000 (depending on level), plus equity and comprehensive benefits. Multiple locations are available including Santa Clara, Westford, Austin, and Durham, providing flexibility for qualified candidates. NVIDIA's commitment to diversity and inclusion, combined with their position at the forefront of AI and accelerated computing, makes this an exceptional opportunity for experienced SRE professionals looking to make a significant impact.