NVIDIA, the pioneer in accelerated computing and inventor of the GPU, is seeking a Senior DevOps and Automation Engineer to join their software infrastructure team. This role is crucial in supporting large-scale GPU clusters interconnected via NVLink and InfiniBand for cutting-edge HPC and AI workloads. The position offers an opportunity to work at the forefront of artificial intelligence and high-performance computing, building and enhancing systems that power groundbreaking developments in AI, HPC, and visualization.
The ideal candidate will be responsible for developing and maintaining CI/CD pipelines, creating automation workflows, and managing infrastructure for GPU clusters. They will work with state-of-the-art technology, including NVIDIA's DGX/HGX systems, and implement modern observability tools like Prometheus and Grafana. The role requires expertise in Python, Ansible, and Shell scripting, along with a strong understanding of Linux and distributed systems.
Working at NVIDIA means being part of a company that's transforming industries through AI and digital twins technology. The position offers exposure to cutting-edge technology and the chance to work with global engineering teams. NVIDIA's commitment to diversity and inclusion ensures a welcoming environment for all employees. This role is perfect for someone who is passionate about infrastructure automation, system reliability, and wants to contribute to the next wave of artificial intelligence development.