NVIDIA is seeking a Senior AI Infrastructure Engineer for their DGX Cloud group to design and maintain large-scale production systems. This role combines software and systems engineering, requiring expertise in systems, networking, coding, database management, and cloud technologies. As part of the DGX Cloud SRE team, you'll ensure reliable GPU cloud services while managing system changes and capacity.
The position offers an opportunity to work with cutting-edge AI infrastructure at NVIDIA, a leader in accelerated computing and AI technology. You'll be responsible for building and maintaining the backbone of NVIDIA's AI training and inferencing platforms, working with multi-GPU clusters and distributed systems.
The role demands both technical expertise and collaborative skills, with opportunities to influence system design and implementation. NVIDIA's culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment. They encourage self-direction while providing support and mentorship for professional growth.
This is an excellent opportunity for experienced engineers passionate about large-scale distributed systems and AI infrastructure. The position offers competitive compensation, including a base salary range of $184,000 - $356,500 (depending on level), equity, and benefits. NVIDIA's status as a technology leader and their commitment to groundbreaking developments in AI and High-Performance Computing make this an exciting opportunity for those looking to make an impact in the field of AI infrastructure.