Taro Logo

Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

World leader in accelerated computing, pioneering AI and digital twins technology to transform industries.
$144,000 - $270,250
Cloud
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

NVIDIA is seeking experienced Software Engineers to join their DGX Cloud team, focusing on building and running private and public clouds at production scale. This role is crucial in supporting customers' AI training and inference development by developing platforms, tools, and services that maintain operational capacity of bare-metal, accelerated compute infrastructure. The position combines cloud infrastructure expertise with AI technology, offering the opportunity to work with cutting-edge systems at scale. The role involves significant work with distributed systems, infrastructure automation, and reliability engineering practices. NVIDIA, as a leader in accelerated computing and AI, offers a compelling environment for engineers interested in working with advanced technology. The position comes with competitive compensation including a base salary range of $144,000-$270,250, equity, and comprehensive benefits. The company is known for its innovative culture and commitment to pushing technological boundaries in AI, High-Performance Computing, and Visualization. This role offers the unique opportunity to work on systems that power next-generation AI and computing infrastructure, making it an excellent choice for engineers passionate about large-scale systems and emerging technologies.

Last updated 2 months ago

Responsibilities For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

  • Design, build, and run cloud infrastructure services
  • Participate in defining internal service level objectives and error budgets
  • Eliminate or automate toil where ROI justifies it
  • Practice sustainable blameless incident prevention and response
  • Participate in on-call rotation
  • Consult with peer teams on systems design best practices

Requirements For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Python
Go
Kubernetes
Linux
  • Proficiency in Python or Go
  • BS degree in Computer Science or related technical field
  • 5+ years of experience in infrastructure and fleet management engineering
  • Experience with infrastructure automation and distributed systems design
  • Track record of project initiation and collaboration
  • In-depth knowledge of Linux, Slurm, Kubernetes, Local and Distributed Storage, and Systems Networking

Benefits For Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Equity
  • Equity
  • Comprehensive benefits package

Interested in this job?

Jobs Related To NVIDIA Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems