Taro Logo

Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$184,000 - $356,500
Cloud
Senior Software Engineer
Hybrid
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA is seeking a Senior AI Infrastructure Engineer for their DGX Cloud group to design and maintain large-scale production systems. This role combines software and systems engineering, requiring expertise in systems, networking, coding, database management, and cloud technologies. As part of the DGX Cloud SRE team, you'll ensure reliable GPU cloud services while managing system changes and capacity.

The position offers an opportunity to work with cutting-edge AI infrastructure at NVIDIA, a leader in accelerated computing and AI technology. You'll be responsible for building and maintaining the backbone of NVIDIA's AI training and inferencing platforms, working with multi-GPU clusters and distributed systems.

The role demands both technical expertise and collaborative skills, with opportunities to influence system design and implementation. NVIDIA's culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment. They encourage self-direction while providing support and mentorship for professional growth.

This is an excellent opportunity for experienced engineers passionate about large-scale distributed systems and AI infrastructure. The position offers competitive compensation, including a base salary range of $184,000 - $356,500 (depending on level), equity, and benefits. NVIDIA's status as a technology leader and their commitment to groundbreaking developments in AI and High-Performance Computing make this an exciting opportunity for those looking to make an impact in the field of AI infrastructure.

Last updated 8 days ago

Responsibilities For Senior AI Infrastructure Engineer - DGX Cloud

  • Design, build, deploy, and run internal tooling for large scale AI training and Inferencing platform
  • Conduct performance characterization and analysis on large multi-GPU clusters
  • Engage in service lifecycle from design through deployment and refinement
  • Support services through system design consulting and tools development
  • Maintain services by monitoring availability and system health
  • Scale systems through automation
  • Practice sustainable incident response
  • Participate in on-call rotation

Requirements For Senior AI Infrastructure Engineer - DGX Cloud

Python
Go
Kubernetes
Linux
  • BS degree in Computer Science or related technical field
  • 6+ years of experience
  • Experience with infrastructure automation and distributed systems design
  • Experience in Python, Go, C/C++, or Java
  • In-depth knowledge of Linux, Networking, Storage, and Containers Technologies
  • Experience with Public Cloud and Infrastructure as Code (IAAC) and Terraform
  • Distributed system experience

Benefits For Senior AI Infrastructure Engineer - DGX Cloud

Equity
  • Equity

Related Jobs