NVIDIA, a global leader in accelerated computing and AI technology, is seeking a Senior Software Engineer for their Cloud Functions team. This role focuses on developing NVIDIA Mission Control Software that runs on superpods, creating an autonomous hardware recovery engine responsible for baseline validation tests, remedial actions, and hardware health monitoring. The position offers an exciting opportunity to work with cutting-edge technology in AI and high-performance computing.
The role involves building and improving a powerful platform that automates diagnosis and repair of GPU/CPU clusters across public clouds, private clouds, and various hardware configurations. You'll be working on implementing scalable software components, enabling Agentic AI for remedial workflows, and developing robust feedback control systems for hardware management.
As a Senior Software Engineer, you'll collaborate with teams across NVIDIA to drive platform adoption and improve GPU utilization. The position requires expertise in modern programming languages like Go and Rust, along with deep understanding of distributed systems and multi-threading concepts. You'll be responsible for leading high-impact projects and influencing the product roadmap to enhance hardware utilization and reduce SRE toil.
NVIDIA offers a competitive compensation package with a base salary range of $184,000 - $356,500 USD (depending on level), plus equity and comprehensive benefits. The company is known for being one of the technology industry's most desirable employers, offering opportunities to work on groundbreaking developments in AI, High-Performance Computing, and Visualization. This is an excellent opportunity for creative engineers who enjoy autonomy and are passionate about developing cloud services at the forefront of technology.