Senior DevOps and Automation Engineer, Fabric Networking - GPU

World leader in accelerated computing, pioneering AI and digital twins technology.
$148,000 - $287,500
DevOps
Senior Software Engineer
Remote
5+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Senior Python Developer, CI/CD Infrastructure and DevOps Tooling

Senior Python Developer role at NVIDIA focusing on CI/CD infrastructure and DevOps tooling, building and maintaining development systems that power NVIDIA's core software products.

Senior Software QA Test Development Engineer

Senior Software QA Test Development Engineer role at NVIDIA, focusing on platform testing, automation, and AI tools development with competitive compensation and benefits.

Senior DevOps Infrastructure Engineer, Open-Source CI and CD

Senior DevOps Infrastructure Engineer position at NVIDIA, focusing on managing GPU-enabled GitHub Actions runners using Kubernetes and modern DevOps tools, offering remote work and competitive compensation.

Senior DevOps Engineer - Accelerated Computing

Senior DevOps Engineer position at NVIDIA working on CUDA Math Libraries team, focusing on build systems and infrastructure for AI and HPC applications.

Senior DevOps Engineer

Senior DevOps Engineer position at NVIDIA focusing on infrastructure, CI/CD, and build & test environments for DPU and Network Adapters.

Description For Senior DevOps and Automation Engineer, Fabric Networking - GPU

NVIDIA, the pioneer in accelerated computing and inventor of the GPU, is seeking a Senior DevOps and Automation Engineer for their Fabric Networking - GPU team. This role is crucial in developing and maintaining software that facilitates GPU communication for High Performance Computing and Deep Learning solutions.

The position involves working with cutting-edge technology, including large GPU clusters interconnected via NVLink and InfiniBand. You'll be responsible for developing automated tools for cluster deployment, implementing modern DevOps practices, and ensuring optimal cluster performance. This role combines hands-on technical expertise with collaborative teamwork across multiple time zones.

The ideal candidate will bring strong expertise in automation tools like Ansible and Python, along with deep knowledge of Linux systems and cluster management. Experience with GPU-focused hardware and software, particularly DGX systems and Compute Clusters, would be highly valuable. The role offers exposure to groundbreaking developments in Artificial Intelligence and High-Performance Computing.

NVIDIA offers a competitive compensation package, including a base salary range of $148,000 - $287,500 USD, equity, and comprehensive benefits. This is an opportunity to join a company at the forefront of AI and accelerated computing, working on technology that powers everything from artificial intelligence to autonomous vehicles. The position offers flexibility with remote work options while being part of a team that's driving innovation in the industry.

Last updated 4 months ago

Responsibilities For Senior DevOps and Automation Engineer, Fabric Networking - GPU

  • Develop automated tools to deploy, provision, and maintain GPU clusters with NVLink and InfiniBand
  • Implement DevOps tools for software updates, maintenance, and cluster monitoring
  • Handle daily cluster failures and troubleshooting
  • Manage cluster software and firmware updates rollout
  • Collaborate with Engineering and Product Teams across multiple time zones

Requirements For Senior DevOps and Automation Engineer, Fabric Networking - GPU

Python
Linux
Kubernetes
  • BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or related field
  • 5+ years experience in deploying and administrating clusters, servers, and infrastructure
  • Expertise in Ansible, Python and Shell Scripting
  • Deep understanding of operating systems, computer networks, and high-performance applications
  • Proven ability to work with cross-functional teams
  • Proficient with Linux fundamentals

Benefits For Senior DevOps and Automation Engineer, Fabric Networking - GPU

Equity
  • Equity

Interested in this job?