Taro Logo

Site Reliability Engineer - Cloud

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$136,000 - $212,750
Site Reliability
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Site Reliability Engineer - Cloud

NVIDIA, a global leader in accelerated computing and AI technology, is seeking a Site Reliability Engineer to join their Digital Marketing Organization. This role combines technical expertise with operational excellence, focusing on maintaining and improving AWS infrastructure and ensuring the reliability of NVIDIA's Digital Marketing Services. The position offers an opportunity to work with cutting-edge technology at a company that has continuously reinvented itself over two decades.

The SRE will be responsible for ensuring all Digital Marketing Services are reliable, fast, and efficient. Key responsibilities include managing AWS Infrastructure, implementing monitoring solutions, and automating deployment pipelines. The role requires strong expertise in Python, Kubernetes, and AWS, with a focus on maintaining high-availability systems and responding to critical incidents.

This is an excellent opportunity for an experienced engineer who thrives in a fast-paced environment and wants to make a significant impact. The position offers competitive compensation ($136,000 - $212,750) plus equity and benefits. NVIDIA's culture promotes diversity and innovation, making it an ideal workplace for those passionate about technology and its applications in AI and digital transformation.

The role combines the best aspects of software engineering and operations, requiring both technical depth and strong communication skills. You'll work with state-of-the-art tools and technologies while contributing to the infrastructure that powers NVIDIA's digital presence. The company's commitment to technological advancement and its position at the forefront of AI computing makes this an exciting opportunity for career growth and development.

Last updated a day ago

Responsibilities For Site Reliability Engineer - Cloud

  • Rapidly debug and triage user-reported issues on the Digital Marketing Organization
  • On-board new applications and services on AWS Infrastructure
  • Contribute to health, performance, and uptime of services running in Linux and Windows
  • Implement monitors, alerts and SOPs for early detection and response to service-impacting issues
  • Automate and create scripts for daily tasks

Requirements For Site Reliability Engineer - Cloud

Python
Java
Kubernetes
Linux
  • MS or BS in Computer Science/Engineering or related field or equivalent experience
  • 5+ years of experience supporting technical operations in production environment
  • Experience with critical production services on Windows or Linux
  • Strong knowledge of Kubernetes Platform, deployments, automation
  • Advanced level experience with Python scripting
  • Must live in East Coast time zones
  • Experience with AWS Cloud Platform
  • SRE On-call experience

Interested in this job?

Jobs Related To NVIDIA Site Reliability Engineer - Cloud