Taro Logo

Senior Software Engineer - GPU Clusters

World leader in accelerated computing, pioneering AI and digital twins technology.
$180,000 - $339,250
Cloud
Senior Software Engineer
In-Person
7+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Senior Software Engineer - GPU Clusters

NVIDIA, the pioneer in GPU technology and AI innovation, is seeking a Senior Software Engineer to lead their GPU clusters initiative. This role sits at the intersection of high-performance computing and artificial intelligence, where you'll be responsible for designing and managing large-scale GPU clusters that power cutting-edge AI workloads.

The position offers an opportunity to work with state-of-the-art technology in a company that's driving the future of AI and computing. You'll be joining a team that values operational excellence and innovation, working on infrastructure that directly impacts the development of next-generation AI solutions.

As a Senior Software Engineer, you'll be responsible for ensuring the reliability and efficiency of GPU clusters across multiple cloud platforms and on-premises environments. This includes implementing automation, maintaining high availability, and continuously improving infrastructure performance. You'll work with technologies like Kubernetes, various cloud platforms (AWS, GCP, Azure, OCI), and modern DevOps tools.

The ideal candidate brings strong technical expertise in cloud infrastructure, containerization, and programming, combined with experience in GPU or high-performance computing environments. You'll need excellent problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.

This role offers competitive compensation, including a base salary range of $180,000 to $339,250, plus equity. You'll be working at the forefront of AI technology, contributing to infrastructure that powers groundbreaking developments in artificial intelligence, autonomous vehicles, and high-performance computing.

Join NVIDIA to be part of a team that's shaping the future of computing and AI, while working with some of the most advanced technology in the industry. This position provides an excellent opportunity for growth and impact in a company that's leading the AI revolution.

Last updated 7 months ago

Responsibilities For Senior Software Engineer - GPU Clusters

  • Design, deploy and support large-scale, distributed GPU clusters for AI and ML workloads
  • Improve infrastructure provisioning, management, and monitoring through automation
  • Ensure high uptime and QoS through operational excellence
  • Support globally distributed cloud environments (AWS, GCP, Azure, OCI) and on-prem
  • Define and implement SLOs and SLIs
  • Write RCA reports for production incidents
  • Participate in on-call rotation
  • Drive evaluation and integration of new GPU technologies

Requirements For Senior Software Engineer - GPU Clusters

Python
Go
Kubernetes
Linux
  • BS degree in Computer Science or equivalent experience
  • 7+ years of software engineering experience
  • 3+ years managing GPU clusters or similar environments
  • Expertise in production-level cloud services
  • Proficiency with Kubernetes, Docker, or similar tools
  • Experience in Python, Go, or Ruby programming
  • Strong Linux and TCP/IP knowledge
  • Proficiency in CI/CD, GitOps, and Infrastructure as Code
  • Strong communication and documentation skills

Benefits For Senior Software Engineer - GPU Clusters

Equity
  • Equity

Interested in this job?