Taro Logo

Senior System Software Engineer, Cloud Services

NVIDIA is the world leader in accelerated computing, pioneering solutions for AI and digital twins that transform industries.
$184,000 - $287,500
Cloud
Senior Software Engineer
Hybrid
5,000+ Employees
8+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, the world leader in accelerated computing, is seeking a Senior System Software Engineer for their Cloud Services team. This role focuses on building and maintaining cloud-hosted services for authentication/authorization across NVIDIA's ecosystem. The position combines deep technical expertise in AWS service development with a strong focus on operational excellence and observability practices.

The role involves architecting and implementing large-scale observability systems, working with cutting-edge monitoring tools like Prometheus, Grafana, and OpenTelemetry. You'll be responsible for ensuring the reliability, performance, and scalability of critical authentication services while providing actionable insights for continuous improvement.

As a senior engineer, you'll collaborate with cross-functional teams, drive automation initiatives, and participate in on-call rotations. The position offers an opportunity to work with modern cloud technologies, containerization platforms, and infrastructure-as-code tools. The team also manages custom front-end services using React for admin functions.

This is an excellent opportunity for an experienced systems engineer who wants to impact critical infrastructure at a company pioneering AI and digital twins technology. The role offers competitive compensation ($184,000-$287,500) plus equity and benefits, and provides the chance to work with global teams on challenging technical problems. NVIDIA's commitment to diversity and inclusion makes it an attractive workplace for professionals from all backgrounds.

Last updated a day ago

Responsibilities For Senior System Software Engineer, Cloud Services

  • Architect, implement, and maintain observability systems at scale
  • Define and refine service-level indicators (SLIs), service-level objectives (SLOs), and error budgets
  • Create and maintain actionable dashboards for system health monitoring
  • Collaborate with teams to integrate observability throughout the application lifecycle
  • Drive automation efforts to reduce manual monitoring work
  • Address performance and reliability issues through root cause analysis
  • Participate in Pager Duty rotations
  • Develop expertise in offerings and manage support channels

Requirements For Senior System Software Engineer, Cloud Services

Python
Go
JavaScript
React
Kubernetes
Cassandra
  • Bachelor's or master's degree in computer science, engineering, or equivalent experience
  • 8+ years in large-scale systems engineering roles
  • Experience with modern monitoring systems (Prometheus, Grafana, Loki, Tempo, Datadog, etc.)
  • Advanced coding skills in Python, Go, or similar languages
  • Proficiency in cloud platforms and containerized environments
  • Strong communication and collaboration skills
  • Experience with incident management and postmortem processes
  • Comfort with JavaScript frameworks like React and Next.js

Related Jobs

Cloud Platform Software Engineer

Senior Cloud Platform Software Engineer role at NVIDIA, focusing on developing AI super compute infrastructure on Kubernetes, offering competitive compensation and the opportunity to work with cutting-edge technology.

Senior Systems Software Engineer, Containers and Kubernetes

Senior Systems Software Engineer position at NVIDIA focusing on container and Kubernetes technologies, offering competitive compensation and the opportunity to work on cutting-edge cloud computing solutions.

Senior Systems Software Engineer, Containers and Kubernetes

Senior Systems Software Engineer role at NVIDIA focusing on container runtimes and Kubernetes technologies, working with GPUs and DPUs.

Senior Software Engineer, Cloud-Native Stack – CSP Engagements

Senior Software Engineering role at NVIDIA focusing on cloud-native stack development for CSP engagements, working with advanced GPU technology and distributed systems.

Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Senior DGX Cloud Software Engineer position at NVIDIA focusing on infrastructure automation and distributed systems for AI computing platforms.