Senior Site Reliability Engineer

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins to transform industries.

Santa Clara, CA, USA • Westford, MA 01886, USA • Austin, TX, USA…

$168,000 - $322,000

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

10+ years of experience

AI · Enterprise SaaS

Description For Senior Site Reliability Engineer

NVIDIA, the world leader in accelerated computing, is seeking a Senior Site Reliability Engineer to join their team. This role offers an exciting opportunity to shape the future of computing and ensure the smooth operation of cutting-edge technologies. The position involves working with AI and pioneering solutions that have significant global impact.

As an SRE at NVIDIA, you'll be responsible for maintaining and improving critical infrastructure across a globally distributed, multi-cloud hybrid environment including AWS, GCP, and on-premises systems. The role requires deep technical expertise in Kubernetes, cloud platforms, and modern DevOps practices, combined with strong coding abilities in languages like Python and Go.

The ideal candidate brings 10+ years of experience in building and supporting critical services, with particular emphasis on infrastructure automation, service reliability, and performance optimization. You'll work closely with cross-functional teams, own end-to-end solutions, and participate in on-call rotations to ensure maximum system uptime.

This position offers competitive compensation with a base salary range of $168,000 - $322,000 (depending on level), plus equity and comprehensive benefits. Multiple locations are available including Santa Clara, Westford, Austin, and Durham, providing flexibility for qualified candidates. NVIDIA's commitment to diversity and inclusion, combined with their position at the forefront of AI and accelerated computing, makes this an exceptional opportunity for experienced SRE professionals looking to make a significant impact.

Last updated 18 days ago

Responsibilities For Senior Site Reliability Engineer

Own and build solutions while collaborating with cross-functional teams
Improve solution provisioning and management through automation
Identify areas to improve service resiliency
Detect performance issues and recommend solutions
Conduct capacity management and planning
Participate in incident reviews and write RCA reports
Deliver SRE solutions in multi-cloud hybrid environment
Ensure high uptime and QoS for internal customers
Participate in on-call rotation

Requirements For Senior Site Reliability Engineer

Kubernetes

Python

Linux

B.S. degree in Computer Science or related technical field with 10+ years experience
Proficiency in Kubernetes administration and Infrastructure as Code
Deep understanding of Linux operating systems and TCP/IP
Expertise with major cloud service provider (AWS, GCP, Azure)
Demonstrated proficiency with end-to-end SRE capabilities
Proficient in monitoring, metrics gathering, APM, and log collection
5+ years coding experience in Python, Go, Ruby, or Groovy
Creative problem solver with excellent debugging and communication skills

Benefits For Senior Site Reliability Engineer

Equity

Equity
Benefits package available

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins to transform industries.

Santa Clara, CA, USA • Westford, MA 01886, USA • Austin, TX, USA…

$168,000 - $322,000

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

10+ years of experience

AI · Enterprise SaaS

Senior Site Reliability Engineer

NVIDIA

Description For Senior Site Reliability Engineer

Responsibilities For Senior Site Reliability Engineer

Requirements For Senior Site Reliability Engineer

Benefits For Senior Site Reliability Engineer

NVIDIA

Jobs Related To NVIDIA Senior Site Reliability Engineer