NVIDIA, a pioneer in accelerated computing and AI technology for over 25 years, is seeking a Senior Site Reliability Engineer for their DGX Cloud initiative. This role is crucial in delivering a fully managed AI platform across major cloud providers, optimizing AI workloads using high-performance NVIDIA infrastructure. The position involves managing large-scale Kubernetes clusters, ensuring system reliability, and maintaining high-performance DGX Cloud clusters for AI researchers and enterprise clients worldwide. The ideal candidate will have extensive experience in SRE practices, Kubernetes administration, and cloud platforms. This is an opportunity to work with cutting-edge AI infrastructure while being part of a company that's transforming industries through AI and digital twins technology. The role offers the flexibility of remote work and the chance to make a significant impact on NVIDIA's cloud infrastructure. As an NVIDIAN, you'll join a diverse, supportive environment where innovation and technical excellence are paramount.