Senior AI Observability Engineer

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology to transform industries.

Santa Clara, CA, USA • Seattle, WA, USA

$184,000 - $356,500

DevOps

Senior Software Engineer

Hybrid

5,000+ Employees

8+ years of experience

AI · Enterprise SaaS

Description For Senior AI Observability Engineer

NVIDIA, the world leader in accelerated computing, is seeking a Senior AI Observability Engineer to join their AI Infrastructure organization. This role focuses on architecting and implementing distributed observability systems for AI and HPC clusters, working directly with NVIDIA's growing AI, Hardware, and Software engineering teams.

The position involves developing sophisticated systems for data collection, aggregation, enrichment, storage, retrieval, and visualization to enhance the efficiency and performance of AI and HPC workloads. You'll be responsible for deploying and operating observability solutions across multiple global compute clusters.

The ideal candidate should have 8+ years of experience with distributed observability systems and a strong background in Python programming. Experience with platforms like Apache Spark, Elastic Search, Grafana, and Prometheus is essential. The role requires both technical expertise and strong collaborative skills, as you'll be working closely with data scientists, researchers, and engineering teams.

NVIDIA offers competitive compensation with a base salary range of $184,000 - $356,500 USD (depending on level), plus equity and comprehensive benefits. The company is committed to fostering a diverse and inclusive work environment, making it an excellent opportunity for professionals looking to make an impact in the AI and accelerated computing space.

This role presents an exciting opportunity to work at the forefront of AI infrastructure, helping to build and maintain the systems that power NVIDIA's cutting-edge research and development. The position combines technical challenges with strategic thinking, requiring someone who can both architect complex systems and understand the broader business impact of their work.

Last updated 13 days ago

Responsibilities For Senior AI Observability Engineer

Collaborate with AI, HW, SW engineering and research teams to deliver observability solutions
Develop, test, and deploy data collectors, pipelines, visualization and retrieval services
Build a self-serve platform
Define data collection and retention policies
Provide operational and strategic data to improve performance and efficiency
Continuously improve quality, workloads, and processes through better observability

Requirements For Senior AI Observability Engineer

Python

Kubernetes

Experience developing large scale, distributed observability systems
Ability to collaborate with data scientists and engineering teams
Experience with turning raw data into actionable reports
Experience with observability platforms (Apache Spark, Elastic/Open Search, Grafana, Prometheus)
Python programming experience and use of API calls
MS (preferred) or BS in Computer Science, Electrical Engineering, or related field
8+ years of proven experience
Excellent planning and interpersonal skills

Benefits For Senior AI Observability Engineer

Equity

Medical Insurance

Equity
Medical Insurance

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology to transform industries.

Santa Clara, CA, USA • Seattle, WA, USA

$184,000 - $356,500

DevOps

Senior Software Engineer

Hybrid

5,000+ Employees

8+ years of experience

AI · Enterprise SaaS

Senior AI Observability Engineer

NVIDIA

Description For Senior AI Observability Engineer

Responsibilities For Senior AI Observability Engineer

Requirements For Senior AI Observability Engineer

Benefits For Senior AI Observability Engineer

NVIDIA

Jobs Related To NVIDIA Senior AI Observability Engineer