Taro Logo

Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$148,000 - $287,500
Cloud
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA is seeking a Senior AI Infrastructure Engineer for their DGX Cloud group, focusing on designing and maintaining large-scale production systems. This role combines software and systems engineering, requiring expertise in systems, networking, coding, database management, and cloud technologies. The position is part of NVIDIA's DGX Cloud SRE team, ensuring reliable GPU cloud services while managing system changes and capacity.

The role offers an opportunity to work with cutting-edge AI infrastructure at one of technology's most respected companies. You'll be responsible for building and maintaining the backbone of NVIDIA's AI training and inferencing platforms, working with multi-GPU clusters and distributed systems. The position combines hands-on technical work with strategic system design and planning.

NVIDIA's culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment. The company encourages collaboration and risk-taking while providing support and mentorship for professional growth. As a leader in AI and accelerated computing, NVIDIA offers the chance to work on meaningful projects that impact various industries.

The position includes competitive compensation with a base salary range of $148,000 - $287,500 USD, plus equity and benefits. The role can be performed from Santa Clara, CA, or remotely from WA or CA, offering flexibility in work location while being part of a team that's pushing the boundaries of AI infrastructure.

Last updated a day ago

Responsibilities For Senior AI Infrastructure Engineer - DGX Cloud

  • Design, build, deploy, and run internal tooling for large scale AI training and Inferencing platform
  • Conduct performance characterization and analysis on large multi-GPU clusters
  • Engage in service lifecycle from design through deployment and refinement
  • Support services through system design consulting and tools development
  • Maintain services by monitoring availability, latency and system health
  • Scale systems through automation
  • Practice sustainable incident response
  • Participate in on-call rotation

Requirements For Senior AI Infrastructure Engineer - DGX Cloud

Python
Go
Linux
Kubernetes
  • BS degree in Computer Science or related technical field
  • 5+ years of experience
  • Experience with infrastructure automation and distributed systems design
  • Experience in Python, Go, C/C++, or Java
  • In-depth knowledge of Linux, Networking, Storage, and Containers Technologies
  • Experience with Public Cloud and Infrastructure as Code (IAAC) and Terraform
  • Distributed system experience

Interested in this job?

Jobs Related To NVIDIA Senior AI Infrastructure Engineer - DGX Cloud