NVIDIA is seeking a Senior AI Infrastructure Engineer for their DGX Cloud group, focusing on designing and maintaining large-scale production systems. This role combines software and systems engineering, requiring expertise in systems, networking, coding, database management, and cloud technologies. The position is part of NVIDIA's DGX Cloud SRE team, ensuring reliable GPU cloud services while managing system changes and capacity.
The role involves working with cutting-edge AI infrastructure, including multi-GPU and multi-node clusters, making it an exciting opportunity for those passionate about high-performance computing and AI technologies. NVIDIA's culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment that encourages collaboration and innovation.
As a Senior AI Infrastructure Engineer, you'll be responsible for building and maintaining the backbone of NVIDIA's AI training and inferencing platform. This includes designing sophisticated tooling systems, conducting performance analysis, and ensuring system reliability through careful monitoring and automation. The position offers a balance of technical challenges and operational responsibilities, including participation in on-call rotations.
NVIDIA's position as a leader in accelerated computing and AI makes this role particularly impactful. The company's work is transforming major industries through AI and digital twins technology. The compensation package is competitive, with a base salary range of $148,000 to $287,500, plus equity and comprehensive benefits.
The ideal candidate will bring strong technical skills in distributed systems, cloud infrastructure, and programming, combined with excellent problem-solving and communication abilities. This role offers the opportunity to work with some of the most advanced AI infrastructure while contributing to NVIDIA's mission of pushing the boundaries of technology.