NVIDIA, a global leader in accelerated computing and AI technology, is seeking a Senior Site Reliability Engineer to join their AI Infrastructure team. This role combines the challenges of large-scale system development with the cutting-edge field of AI infrastructure. As an SRE, you'll be responsible for maintaining and optimizing critical systems that power NVIDIA's AI capabilities across global public and private clouds. The position offers the opportunity to work with state-of-the-art technology while implementing SRE best practices, including incident management, monitoring, and performance optimization.
The ideal candidate will bring 12+ years of experience in Software Development or SRE, along with strong expertise in Python programming and cloud platforms. You'll be working in an environment that values innovation, continuous learning, and technical excellence. The role involves not just technical work but also mentoring peers and contributing to a diverse, high-performing team.
NVIDIA offers a unique opportunity to work at the intersection of SRE and AI, where you'll be handling sophisticated infrastructure that powers some of the most advanced AI systems in the world. The company's culture encourages creativity, autonomy, and forward-thinking, making it one of the technology world's most desirable employers. You'll be part of a team that's defining the next era of computing, working on systems that power computers, robots, and self-driving cars that can understand the world.
This position provides the chance to make a lasting impact on the world while working with cutting-edge technology and outstanding colleagues. The role offers exposure to deep learning frameworks, AI training and inferencing systems, and the opportunity to work on distributed systems with stringent SLAs. If you're passionate about reliability engineering and want to be at the forefront of AI infrastructure, this role at NVIDIA presents an exceptional opportunity to advance your career while contributing to groundbreaking technological advancements.