Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$148,000 - $287,500
Cloud
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA is seeking a Senior AI Infrastructure Engineer to join their DGX Cloud SRE group, focusing on designing and maintaining large-scale production systems. This role combines software and systems engineering practices, requiring expertise in systems, networking, coding, database management, and cloud technologies. The position is part of NVIDIA's DGX Cloud SRE team, ensuring reliable GPU cloud services for both internal and external users.

The role demands a strong background in infrastructure automation, distributed systems design, and experience with modern cloud technologies like Kubernetes and OpenStack. The ideal candidate will have 5+ years of experience and a BS in Computer Science or related field, with expertise in languages like Python, Go, C/C++, or Java. Knowledge of Linux, networking, and container technologies is essential.

NVIDIA offers a competitive compensation package with a base salary range of $148,000 - $287,500 USD, plus equity and benefits. The company is known for its innovative work in AI, High-Performance Computing, and Visualization, with the GPU being their groundbreaking invention. They promote a culture of diversity, intellectual curiosity, and problem-solving, encouraging collaboration and risk-taking in a blame-free environment.

This position offers the opportunity to work on meaningful projects while receiving support and mentorship for professional growth. The role involves being part of a team that ensures maximum reliability and uptime of GPU cloud services while managing system changes, capacity, and performance. The work environment is dynamic and forward-thinking, perfect for creative and autonomous professionals passionate about advancing technology.

Last updated 21 hours ago

Responsibilities For Senior AI Infrastructure Engineer - DGX Cloud

  • Design, build, deploy, and run internal tooling for large scale AI training and Inferencing platform
  • Conduct performance characterization and analysis on large multi-GPU clusters
  • Engage in service lifecycle from design through deployment and refinement
  • Support services through system design consulting and tools development
  • Maintain services by monitoring availability, latency and system health
  • Scale systems through automation
  • Practice sustainable incident response
  • Participate in on-call rotation

Requirements For Senior AI Infrastructure Engineer - DGX Cloud

Python
Go
Linux
Kubernetes
  • BS degree in Computer Science or related technical field
  • 5+ years of experience
  • Experience with infrastructure automation and distributed systems design
  • Experience in Python, Go, C/C++, or Java
  • In-depth knowledge of Linux, Networking, Storage, and Containers Technologies
  • Experience with Public Cloud and Infrastructure as Code (IAAC) and Terraform
  • Distributed system experience

Benefits For Senior AI Infrastructure Engineer - DGX Cloud

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Senior AI Infrastructure Engineer - DGX Cloud

Senior AI Infrastructure Engineer - DGX Cloud

Senior AI Infrastructure Engineer position at NVIDIA, focusing on DGX Cloud services, offering $148K-$287.5K salary plus benefits. Requires 5+ years experience in cloud infrastructure and distributed systems.

Senior Software Engineer, DGX Cloud Orchestration

Senior Software Engineer position at NVIDIA focusing on DGX Cloud orchestration, building scalable automation solutions and APIs for high-performance GPU infrastructure.

Sr Software Development Engineer, Kumo ADC Support

Senior Software Development Engineer position at AWS, focusing on cloud computing solutions for U.S. Intelligence Community agencies, requiring Top Secret clearance and extensive development experience.

Network Development Engineer, Enterprise Network, GCNA

Senior Network Development Engineer role at Amazon AWS, focusing on scaling and automating enterprise network infrastructure across global data centers.

Senior Software Engineer

Senior Software Engineer role at Microsoft working on Intune's Resource Access/Cloud PKI team, building highly scalable cloud services and providing technical leadership.