Taro Logo

Senior Site Reliability Engineer

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins that transform industries.
DevOps
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Robotics · Automotive

Description For Senior Site Reliability Engineer

NVIDIA, the world leader in accelerated computing, is seeking a Senior Site Reliability Engineer to join their Infrastructure, Planning and Processes organization. This role is part of a dynamic team that develops and maintains sophisticated build & test environments for various hardware platforms including NVIDIA GPUs and Tegra Processors across multiple operating systems. The position involves working with diverse business units including Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence, Robotics, and Driverless Cars.

The ideal candidate will be responsible for implementing and managing Kubernetes architectures, developing automation tools, and ensuring high availability of systems. They will work with cutting-edge technologies in cloud infrastructure, containerization, and DevOps practices. The role requires strong expertise in programming (Python/Go), infrastructure as code, and modern monitoring solutions.

This is an excellent opportunity for an experienced SRE professional to work with state-of-the-art technology at a company that's driving innovation in AI, gaming, and autonomous vehicles. The position offers competitive compensation and benefits, working alongside some of the most forward-thinking professionals in the technology industry. The role combines technical depth with the opportunity to impact critical infrastructure supporting NVIDIA's groundbreaking work in accelerated computing.

Last updated 11 minutes ago

Responsibilities For Senior Site Reliability Engineer

  • End-to-end Implementation of Kubernetes architecture - design, deploy, hardening, networking, sizing, scaling
  • Implementing high availability clusters and disaster recovery solutions
  • Design and implement logging & monitoring solutions
  • Develop tools for automating workflows
  • Participate in prototyping and developing cloud infrastructure
  • Participate in on-call support and critical issue coverage
  • Implement critical metrics using various analytics methods and dashboards

Requirements For Senior Site Reliability Engineer

Kubernetes
Python
Go
Linux
  • Solid programming background in Python/Go
  • 5+ years of proven experience
  • Bachelor's or master's degree in computer science, Software Engineering, or equivalent
  • Proficient in configuration management & IaC tools (Ansible, Puppet, Chef, Terraform)
  • Strong expertise in Kubernetes architecture, networking, RBAC, persistent storage solutions
  • Strong background with Gitlab, Jenkins, Flux, ArgoCD
  • Proficient in secret management tools
  • Proficient in data analytics/visualization & monitoring tools
  • Excellent debugging, problem solving and analytical skills

Benefits For Senior Site Reliability Engineer

  • Competitive salaries
  • Generous benefits package

Interested in this job?

Jobs Related To NVIDIA Senior Site Reliability Engineer

Senior DevOps Engineer, IPP Sanity Engineering

Senior DevOps Engineer position at NVIDIA focusing on IPP Sanity Engineering in Santa Clara, CA

Senior Python Developer, CI/CD Infrastructure and DevOps Tooling

Senior Python Developer role at NVIDIA focusing on CI/CD infrastructure and DevOps tooling, building and maintaining development systems that power NVIDIA's core software products.

Senior Software QA Test Development Engineer

Senior Software QA Test Development Engineer role at NVIDIA, focusing on platform testing, automation, and AI tools development with competitive compensation and benefits.

Senior DevOps Engineer

Senior DevOps Engineer position at NVIDIA focusing on infrastructure, CI/CD, and build & test environments for DPU and Network Adapters.

Senior Software Engineer, Code Coverage Tools

Senior Software Engineer position at NVIDIA focusing on developing code coverage tools for chip design and verification, offering competitive compensation and opportunity to work with cutting-edge technology.