NVIDIA, a world leader in artificial intelligence computing and graphics technology, is seeking a Senior Site Reliability Engineer for their DGX Cloud platform. This role sits at the intersection of cloud infrastructure and AI computing, working with NVIDIA's cutting-edge DGX systems that power some of the most advanced AI workloads in the industry. The position requires expertise in cloud infrastructure, site reliability engineering practices, and a deep understanding of distributed systems. As part of NVIDIA's cloud operations team, you'll be responsible for ensuring the reliability, scalability, and performance of the DGX Cloud platform that serves enterprise customers worldwide. This is an opportunity to work with state-of-the-art technology in AI and cloud computing, while being part of a company that's driving innovation in multiple industries including artificial intelligence, gaming, autonomous vehicles, and scientific computing. The role offers exposure to complex distributed systems at scale and the chance to work with a team of highly skilled engineers who are passionate about building reliable, performant cloud infrastructure for AI workloads.