Site Reliability Engineer - Cloud

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.

New York, NY, USA

$136,000 - $212,750

Site Reliability

Senior Software Engineer

Remote

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Description For Site Reliability Engineer - Cloud

NVIDIA, a global leader in accelerated computing and AI technology, is seeking a Site Reliability Engineer to join their Digital Marketing Organization. This role combines technical expertise with operational excellence, focusing on maintaining and improving AWS infrastructure and ensuring the reliability of NVIDIA's Digital Marketing Services. The position offers an opportunity to work with cutting-edge technology at a company that has continuously reinvented itself over two decades.

The SRE will be responsible for ensuring all Digital Marketing Services are reliable, fast, and efficient. Key responsibilities include managing AWS Infrastructure, implementing monitoring solutions, and automating deployment pipelines. The role requires strong expertise in Python, Kubernetes, and AWS, with a focus on maintaining high-availability systems and responding to critical incidents.

This is an excellent opportunity for an experienced engineer who thrives in a fast-paced environment and wants to make a significant impact. The position offers competitive compensation ($136,000 - $212,750) plus equity and benefits. NVIDIA's culture promotes diversity and innovation, making it an ideal workplace for those passionate about technology and its applications in AI and digital transformation.

The role combines the best aspects of software engineering and operations, requiring both technical depth and strong communication skills. You'll work with state-of-the-art tools and technologies while contributing to the infrastructure that powers NVIDIA's digital presence. The company's commitment to technological advancement and its position at the forefront of AI computing makes this an exciting opportunity for career growth and development.

Last updated a day ago

Responsibilities For Site Reliability Engineer - Cloud

Rapidly debug and triage user-reported issues on the Digital Marketing Organization
On-board new applications and services on AWS Infrastructure
Contribute to health, performance, and uptime of services running in Linux and Windows
Implement monitors, alerts and SOPs for early detection and response to service-impacting issues
Automate and create scripts for daily tasks

Requirements For Site Reliability Engineer - Cloud

Python

Java

Kubernetes

Linux

MS or BS in Computer Science/Engineering or related field or equivalent experience
5+ years of experience supporting technical operations in production environment
Experience with critical production services on Windows or Linux
Strong knowledge of Kubernetes Platform, deployments, automation
Advanced level experience with Python scripting
Must live in East Coast time zones
Experience with AWS Cloud Platform
SRE On-call experience

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.

New York, NY, USA

$136,000 - $212,750

Site Reliability

Senior Software Engineer

Remote

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Interested in this job?

Site Reliability Engineer - Cloud

NVIDIA

Description For Site Reliability Engineer - Cloud

Responsibilities For Site Reliability Engineer - Cloud

Requirements For Site Reliability Engineer - Cloud

NVIDIA

Jobs Related To NVIDIA Site Reliability Engineer - Cloud