Taro Logo

Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$144,000 - $333,500
Cloud
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

NVIDIA is seeking an experienced Senior DGX Cloud Software Engineer to join their Infrastructure Automation and Distributed Systems team. This role is central to supporting NVIDIA's AI training and inference development initiatives through building and maintaining platforms, tools, and services for their bare-metal, accelerated compute infrastructure.

The position offers an exciting opportunity to work at the intersection of cloud infrastructure and AI computing, where you'll be responsible for designing and implementing large-scale systems that power NVIDIA's DGX Cloud platform. You'll be working with cutting-edge technologies including Kubernetes, Linux, and distributed systems, while applying best practices in infrastructure automation and reliability engineering.

As a senior engineer, you'll have the chance to make significant technical contributions while collaborating with talented peers across the organization. The role involves both hands-on development and strategic thinking, requiring expertise in languages like Python or Go, and deep knowledge of cloud infrastructure components.

NVIDIA offers a competitive compensation package with a base salary range of $144,000 to $333,500 USD, plus equity and comprehensive benefits. The company is known for its innovative culture and leadership in AI and accelerated computing, making it an ideal place for engineers passionate about building the future of computing infrastructure.

The position offers flexibility with remote work options while being part of a team that's directly impacting the advancement of AI technology. You'll be working on challenging technical problems at scale, with access to NVIDIA's cutting-edge hardware and software stack. This is an excellent opportunity for experienced infrastructure engineers looking to make a significant impact in the AI and cloud computing space.

Last updated a day ago

Responsibilities For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

  • Design, build, and run cloud infrastructure services
  • Participate in defining internal facing service level objectives and error budgets
  • Eliminate or automate toil where ROI justifies it
  • Practice sustainable blameless incident prevention and response
  • Participate in on-call rotation
  • Consult with peer teams on systems design best practices

Requirements For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Python
Go
Kubernetes
Linux
  • Proficiency in Python or Go
  • BS degree in Computer Science or related technical field
  • 5+ years of experience in infrastructure and fleet management engineering
  • Experience with infrastructure automation and distributed systems design
  • Track record of project initiation and collaboration
  • In-depth knowledge of Linux, Slurm, Kubernetes, Local and Distributed Storage, and Systems Networking

Benefits For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Equity
  • Equity
  • Additional benefits mentioned but not specified in detail

Interested in this job?

Jobs Related To NVIDIA Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems