Taro Logo

Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

World leader in accelerated computing, pioneering AI and digital twins technology.
$144,000 - $270,250
Cloud
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA is seeking an experienced Senior DGX Cloud Software Engineer to join their Infrastructure Automation and Distributed Systems team. This role is central to supporting NVIDIA's AI training and inference development initiatives through the DGX Cloud platform. The position offers a unique opportunity to work with cutting-edge technology in AI and cloud infrastructure at one of technology's most innovative companies.

The role involves building and maintaining large-scale private and public cloud systems, with a focus on bare-metal, accelerated compute infrastructure. You'll be working with advanced technologies including BlueField Networking, Infiniband topologies, and NVIDIA's Collective Communication Library (NCCL). The position requires strong expertise in cloud infrastructure, distributed systems, and automation at scale.

As a senior engineer, you'll be responsible for designing and implementing cloud infrastructure services, participating in defining service level objectives, and maintaining high reliability standards. The role includes on-call responsibilities and requires a collaborative approach to problem-solving and system design.

NVIDIA offers competitive compensation with a base salary range of $144,000 - $270,250 USD (depending on level), plus equity and comprehensive benefits. The company is known for its innovative culture and commitment to pushing technological boundaries in AI, High-Performance Computing, and Visualization.

The ideal candidate will have 5+ years of relevant experience, strong programming skills in Python or Go, and deep knowledge of cloud technologies like Kubernetes and Linux. Experience with ML/AI systems is a plus but not required. This role offers the opportunity to work on challenging problems at scale while contributing to NVIDIA's mission of accelerating the next wave of artificial intelligence.

Last updated 4 days ago

Responsibilities For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

  • Design, build, and run cloud infrastructure services
  • Participate in defining internal service level objectives and error budgets
  • Eliminate or automate toil where ROI justifies it
  • Practice sustainable blameless incident prevention and response
  • Participate in on-call rotation
  • Consult with peer teams on systems design best practices

Requirements For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Python
Go
Kubernetes
Linux
  • Proficiency in Python or Go
  • BS degree in Computer Science or related technical field
  • 5+ years of experience in infrastructure and fleet management engineering
  • Experience with infrastructure automation and distributed systems design
  • Track record of project initiation and collaboration
  • In-depth knowledge of Linux, Slurm, Kubernetes, Storage, and Systems Networking

Benefits For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Equity
  • Equity

Related Jobs