NVIDIA is seeking a Senior AI Infrastructure Engineer for their DGX Cloud team to ensure maximum reliability and uptime of GPU cloud services. This role combines SRE principles with cutting-edge AI infrastructure, focusing on building tooling, reporting, and automation to enable operational excellence. The position offers an opportunity to work with state-of-the-art technology at NVIDIA, a leader in accelerated computing and AI innovation.
The role involves designing and implementing critical infrastructure tools and data pipelines that directly impact business decisions at the executive level. You'll be working with cloud infrastructure, incident management systems, and modern DevOps tools while collaborating with various teams to improve operational efficiency.
As a Senior AI Infrastructure Engineer, you'll be responsible for maintaining high-reliability systems while enabling developer productivity. The position requires a strong background in distributed systems, infrastructure automation, and programming languages like Python, Go, or TypeScript. Knowledge of Kubernetes, terraform, and ML concepts is highly valued.
NVIDIA offers a competitive compensation package with a base salary range of $144,000 - $270,250 USD, plus equity and comprehensive benefits. The company is known for its innovative work in AI, High-Performance Computing, and Visualization, making it an ideal place for those passionate about cutting-edge technology and scalable infrastructure.
The role offers both technical challenges and leadership opportunities, requiring someone who can balance independent initiative with strong collaboration skills. Working at NVIDIA means being at the forefront of AI and cloud computing innovation, with the chance to impact how some of the world's most advanced computing systems are operated and maintained.