Taro Logo

Senior AI Infrastructure Engineer - DGX Cloud

World leader in accelerated computing, pioneering AI and digital twins technology to transform industries.
$148,000 - $287,500
Cloud
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS
This job posting is no longer active. Check out these related jobs instead:

Job Description

NVIDIA is seeking a Senior AI Infrastructure Engineer for their DGX Cloud group, focusing on designing and maintaining large-scale production systems. This role combines software and systems engineering, requiring expertise in systems, networking, coding, database management, and cloud technologies. The position is part of NVIDIA's DGX Cloud SRE team, ensuring reliable GPU cloud services while managing system changes and capacity.

The role involves working with cutting-edge AI infrastructure, including multi-GPU and multi-node clusters, making it an exciting opportunity for those passionate about high-performance computing and AI technologies. NVIDIA's culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment that encourages collaboration and innovation.

As a Senior AI Infrastructure Engineer, you'll be responsible for building and maintaining the backbone of NVIDIA's AI training and inferencing platform. This includes designing sophisticated tooling systems, conducting performance analysis, and ensuring system reliability through careful monitoring and automation. The position offers a balance of technical challenges and operational responsibilities, including participation in on-call rotations.

NVIDIA's position as a leader in accelerated computing and AI makes this role particularly impactful. The company's work is transforming major industries through AI and digital twins technology. The compensation package is competitive, with a base salary range of $148,000 to $287,500, plus equity and comprehensive benefits.

The ideal candidate will bring strong technical skills in distributed systems, cloud infrastructure, and programming, combined with excellent problem-solving and communication abilities. This role offers the opportunity to work with some of the most advanced AI infrastructure while contributing to NVIDIA's mission of pushing the boundaries of technology.

Last updated 2 months ago

Responsibilities For Senior AI Infrastructure Engineer - DGX Cloud

  • Design, build, deploy, and run internal tooling for large scale AI training and Inferencing platform
  • Conduct performance characterization and analysis on large multi-GPU clusters
  • Engage in service lifecycle from design through deployment and refinement
  • Support services through system design consulting and tools development
  • Maintain services by monitoring availability, latency and system health
  • Scale systems through automation
  • Practice sustainable incident response
  • Participate in on-call rotation

Requirements For Senior AI Infrastructure Engineer - DGX Cloud

Python
Go
Linux
Kubernetes
  • BS degree in Computer Science or related technical field
  • 5+ years of experience
  • Experience with infrastructure automation and distributed systems design
  • Experience in Python, Go, C/C++, or Java
  • In-depth knowledge of Linux, Networking, Storage, and Containers Technologies
  • Experience with Public Cloud and Infrastructure as Code (IAAC) and Terraform
  • Distributed system experience

Benefits For Senior AI Infrastructure Engineer - DGX Cloud

Medical Insurance
Equity
  • Equity compensation
  • Comprehensive benefits package