NVIDIA, a global leader in accelerated computing and AI technology, is seeking a System Software Engineer for their Platform Compute team. This role is crucial in managing and scaling their multi-cloud training delivery platform that spans across 3-4 cloud service providers and approximately 50 regions. The position offers a unique opportunity to work on systems that enable AI learning and development at a massive scale.
The role combines DevOps expertise with platform engineering, requiring deep knowledge of cloud infrastructure, containerization, and automation. You'll be responsible for ensuring 24/7 operation of critical training infrastructure while optimizing costs and preventing compute capacity shortages. This is particularly important as the platform faces potential 10x increase in training demand.
As a core member of the learning systems platform team, you'll work alongside experts and educators to create scalable, reliable learning experiences. The position involves building and maintaining sophisticated cloud infrastructure using technologies like Kubernetes, Terraform, and Python, while working with multiple cloud providers including AWS, Azure, and GCP.
The ideal candidate will bring 8+ years of DevOps experience, strong technical skills in cloud infrastructure, and excellent problem-solving abilities. You'll be working on cutting-edge AI learning platforms, making advanced technologies accessible to learners worldwide. The role offers competitive compensation, including a base salary range of $168,000 - $322,000 (depending on level), equity, and comprehensive benefits.
NVIDIA provides an exceptional work environment, consistently ranked as one of the most desirable employers in the technology sector. This position offers the opportunity to make a significant impact on how people learn and apply AI technologies, while working with some of the industry's most innovative minds in a rapidly growing field.