Taro Logo

Senior Software Engineer, Bare Metal Automation - DGX Cloud

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and AI solutions.
$148,000 - $287,500
Backend
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA is seeking an experienced Senior Software Engineer to join their DGX Cloud team, focusing on bare metal automation for AI infrastructure. This role combines hardware expertise with software engineering, requiring deep knowledge of GPU systems and distributed computing. The position involves managing large-scale GPU clusters, implementing monitoring solutions, and ensuring optimal performance of AI workloads.

The ideal candidate will have 5+ years of experience working with large-scale production systems and bare metal hardware. They should be proficient in systems programming languages like Go and Python, with a strong foundation in computer science fundamentals. The role offers an opportunity to work at the cutting edge of AI infrastructure, contributing to NVIDIA's mission of advancing GPU computing technology.

Working at NVIDIA means joining one of technology's most desirable employers, with a chance to impact the future of AI computing. The company offers competitive compensation, including a base salary range of $148,000 - $287,500 (depending on level), equity, and comprehensive benefits. The position offers flexibility with remote work options while being part of a team that's pushing the boundaries of what's possible in AI and GPU computing.

This role is perfect for someone who combines technical expertise with a passion for infrastructure automation, has strong problem-solving abilities, and thrives in a collaborative environment. You'll be working with cutting-edge technology, helping to scale and optimize NVIDIA's AI infrastructure while contributing to the company's position as a leader in accelerated computing.

Last updated 7 days ago

Responsibilities For Senior Software Engineer, Bare Metal Automation - DGX Cloud

  • Work on DGX Cloud team managing production systems for large scalable GPU clusters
  • Implement monitoring and health management capabilities for GPU assets
  • Manage fleets of GPU nodes
  • Work with teams across NVIDIA to ensure production AI clusters run reliably
  • Evaluate system failures and improve services based on incident management process

Requirements For Senior Software Engineer, Bare Metal Automation - DGX Cloud

Python
Go
  • 5+ years experience in similar role with large-scale production systems
  • BS in Computer Science, Engineering, Physics, Mathematics or equivalent experience
  • Software development experience with bare metal hardware APIs and frameworks
  • Strong communication skills and ability to work with cross-functional teams
  • Proficiency in systems programming languages (Go, Python)
  • Solid understanding of data structures and algorithms

Benefits For Senior Software Engineer, Bare Metal Automation - DGX Cloud

Equity
  • Equity
  • Comprehensive benefits package