NVIDIA, a global leader in accelerated computing and AI technology, is seeking a Senior Software Engineer for their DGX Cloud team. This role focuses on building and maintaining large-scale GPU infrastructure for AI workloads, combining expertise in distributed systems with cutting-edge AI technology. The position offers an opportunity to work with NVIDIA's industry-leading GPU technology and kubernetes-based infrastructure.
The role involves developing and maintaining production systems that enable scalable GPU clusters for AI workloads, implementing sophisticated monitoring and health management capabilities, and ensuring optimal performance of AI infrastructure. You'll be working with kubernetes APIs and frameworks, not just operating clusters, and will be responsible for improving system reliability and performance.
As part of NVIDIA's team, you'll be at the forefront of AI computing innovation, working with state-of-the-art technology and contributing to solutions that power AI applications across various industries. The company offers competitive compensation, including a base salary range of $144,000 to $270,250, plus equity and comprehensive benefits.
The ideal candidate brings 5+ years of experience in similar roles, strong expertise in kubernetes and distributed systems, and a solid foundation in computer science or related fields. You should be comfortable with systems programming languages like Go and Python, and have a proven track record of working with large-scale production systems.
This position offers a unique opportunity to work with one of the most respected companies in the technology sector, known for its innovation in GPU computing and AI. You'll be part of a team that's pushing the boundaries of what's possible in AI infrastructure, while working in a collaborative environment that values creativity and autonomous thinking.