Taro Logo

Distinguished Engineer, Observability, Monitoring, and Remediation

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$308,000 - $471,500
Cloud
Principal Software Engineer
In-Person
5,000+ Employees
18+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, the industry leader in accelerated computing, is seeking a Distinguished Engineer to lead their observability, monitoring, and remediation initiatives for DGX Cloud. This role combines deep technical expertise with strategic leadership, focusing on building robust infrastructure for accelerated computing environments.

The position requires an exceptional leader with 18+ years of experience who will shape the technical strategy for collecting, analyzing, and managing observability data across all infrastructure layers. You'll be responsible for developing sophisticated auto-remediation strategies and ensuring the highest standards of availability and operational excellence.

As a Distinguished Engineer at NVIDIA, you'll work at the intersection of cloud infrastructure, AI, and enterprise computing. You'll collaborate with NVIDIA leadership and external partners to define industry standards for accelerated computing infrastructure. The role offers the opportunity to work with cutting-edge technology while solving complex challenges in cloud computing and infrastructure management.

The ideal candidate brings extensive experience in cloud-native architectures, demonstrated success in delivering high-impact solutions, and strong leadership capabilities. You'll be part of NVIDIA's mission to transform industries through AI and accelerated computing, working with some of the most innovative minds in technology.

NVIDIA offers a competitive compensation package with a base salary range of $308,000 - $471,500, plus equity and comprehensive benefits. This role presents an exceptional opportunity to shape the future of cloud infrastructure while working for a global technology leader known for pushing the boundaries of innovation.

Last updated 10 days ago

Responsibilities For Distinguished Engineer, Observability, Monitoring, and Remediation

  • Define and drive technical implementation for DGX Cloud offerings in observability, monitoring, and remediation
  • Drive awareness and technical strategy for technical capabilities into DGX Cloud engineering practices
  • Guide technical delivery into DGX Cloud systems across enterprise, public cloud, and high security deployments
  • Collaborate with customers, infrastructure providers, and strategic partners
  • Ensure DGX Cloud, customers, and partners achieve operational excellence
  • Lead technical aspects of planning and continuous evolution of large technical scope

Requirements For Distinguished Engineer, Observability, Monitoring, and Remediation

Kubernetes
  • 18+ years in technical roles focusing on observability and monitoring for cloud infrastructure
  • 5+ years of lead experience
  • BS/MS or higher in systems/software engineering or related fields
  • Technical proficiency in multi-tenant data center and cloud-native architectures
  • Experience with bare metal, virtualization, containerization, and higher level abstractions
  • Proven success delivering high-impact technically sophisticated solutions
  • Strong technical leadership abilities
  • Strong collaboration and influence skills

Benefits For Distinguished Engineer, Observability, Monitoring, and Remediation

Equity
  • Competitive base salary
  • Equity
  • Additional benefits package

Related Jobs

Principal Cloud Software Engineer

Principal Cloud Software Engineer position at NVIDIA, leading GPU cloud services development with 15+ years experience required, offering $272K-$425.5K salary plus benefits.

Principal Cloud Software Engineer

Principal Cloud Software Engineer position at NVIDIA, leading GPU cloud services development with 15+ years experience required, offering $272K-$425.5K salary plus benefits.

Principal Cloud Software Engineer

Principal Cloud Software Engineer position at NVIDIA focusing on GPU Cloud computing and AI, offering competitive salary and benefits.

Distinguished Engineer, Observability, Monitoring, and Remediation

Lead observability and monitoring strategies for NVIDIA's DGX Cloud as a Distinguished Engineer, driving technical implementation and automation across cloud infrastructure.

Principal Engineer - Enterprise Applications

Lead enterprise IT systems development at NVIDIA, designing and implementing scalable solutions while driving innovation in cloud computing and AI/ML technologies.