NVIDIA is seeking an experienced Senior HPC AI Cluster Engineer to join their E2E software verification HPC/AI Infrastructure team. This role represents an exciting opportunity to work at the forefront of accelerated computing and artificial intelligence, building and maintaining supercomputers and HPC clusters based on cutting-edge technologies.
The position combines deep technical expertise in HPC systems with hands-on engineering work, requiring skills across system architecture, infrastructure automation, and performance optimization. You'll be working with the latest accelerated computing and deep learning platforms, collaborating with scientific researchers and developers to improve workflows and develop innovative solutions.
As a Senior HPC AI Cluster Engineer, you'll be responsible for designing and implementing large-scale HPC/AI clusters, managing workload orchestration, developing automation tools, and ensuring optimal system performance. The role requires expertise in Linux systems, networking protocols, storage solutions, and modern DevOps practices.
NVIDIA, as the world leader in accelerated computing, offers an environment where you'll be working with cutting-edge technology and contributing to breakthroughs in AI and GPU computing. The company's focus on innovation and technical excellence makes this an ideal position for someone passionate about high-performance computing and artificial intelligence.
The role offers the opportunity to work with multiple teams across the organization, providing technical leadership and developing standardized methodologies. You'll be involved in research and development activities, participating in proof-of-concepts for future improvements, and helping shape the future of HPC/AI infrastructure.
This position is perfect for a seasoned engineer who combines strong technical skills with a strategic mindset, capable of both hands-on implementation and high-level system architecture. The role offers significant growth potential and the chance to work on some of the most advanced computing systems in the industry.