Senior Site Reliability Engineer

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.

Pune, Maharashtra, India • Bengaluru, Karnataka, India

DevOps

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Robotics · Automotive

Description For Senior Site Reliability Engineer

NVIDIA is seeking an exceptional Senior Site Reliability Engineer to join their Infrastructure, Planning and Processes organization. This role is part of a dynamic team responsible for developing and maintaining sophisticated build & test environments for various hardware platforms including NVIDIA GPUs and Tegra Processors across multiple operating systems. The position offers an opportunity to work with cutting-edge technologies in AI, Robotics, and Autonomous Vehicles.

The ideal candidate will be responsible for implementing and managing Kubernetes architectures, establishing high-availability clusters, and developing automation tools. They will work with various business units within NVIDIA Software, including Graphics Processors, Mobile Processors, Deep Learning, and Artificial Intelligence teams. The role requires expertise in infrastructure as code, monitoring solutions, and cloud infrastructure development.

This is an excellent opportunity for a seasoned SRE professional who thrives in a fast-paced environment and wants to work with state-of-the-art technology. The position offers competitive compensation and benefits, making it an attractive opportunity for those looking to advance their career at one of the technology world's most desirable employers. NVIDIA's commitment to innovation in accelerated computing and AI makes this an exciting opportunity to work on transformative technologies that impact various industries.

Last updated 2 months ago

Responsibilities For Senior Site Reliability Engineer

End-to-end Implementation of Kubernetes architecture - design, deploy, hardening, networking, sizing, scaling
Implementing high availability clusters and disaster recovery solutions
Design and implement logging & monitoring solutions
Develop tools for automating workflows
Participate in prototyping and developing cloud infrastructure
Participate in on-call support and critical issue coverage
Implement critical metrics using various analytics methods and dashboards

Requirements For Senior Site Reliability Engineer

Kubernetes

Python

Linux

Solid programming background in Python/Go
5+ years of proven experience
Bachelor's or master's degree in computer science, Software Engineering, or equivalent
Proficient in configuration management & IaC tools (Ansible, Puppet, Chef, Terraform)
Strong background with Gitlab, Jenkins, Flux, ArgoCD
Strong expertise in Kubernetes architecture
Proficient in secret management tools
Proficient in data analytics/visualization & monitoring tools
Excellent debugging, problem solving and analytical skills

Benefits For Senior Site Reliability Engineer

Competitive salaries
Generous benefits package

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.

Pune, Maharashtra, India • Bengaluru, Karnataka, India

DevOps

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Robotics · Automotive

Interested in this job?

Senior Site Reliability Engineer

NVIDIA

Description For Senior Site Reliability Engineer

Responsibilities For Senior Site Reliability Engineer

Requirements For Senior Site Reliability Engineer

Benefits For Senior Site Reliability Engineer

NVIDIA

Jobs Related To NVIDIA Senior Site Reliability Engineer