Taro Logo

AI and ML Infra Software Engineer, GPU Clusters

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$184,000 - $356,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
8+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, a global leader in accelerated computing and AI technology, is seeking an AI/ML Infrastructure Software Engineer to join their Hardware Infrastructure team. This role combines cutting-edge AI technology with infrastructure optimization, offering an opportunity to directly impact the efficiency of NVIDIA's research capabilities. The position involves working with state-of-the-art GPU clusters and implementing advanced solutions for AI/ML workloads.

The role requires deep expertise in both AI/ML technologies and infrastructure management, with responsibilities spanning from performance optimization to collaboration with research teams. You'll be working with cutting-edge technologies including GPU computing, high-performance storage systems, and modern container orchestration platforms. The position offers competitive compensation with a base salary range of $184,000 - $356,500 USD depending on level and experience.

NVIDIA's culture emphasizes innovation, continuous learning, and collaborative problem-solving. The company has been at the forefront of transforming computer graphics, PC gaming, and accelerated computing for over 25 years. As an NVIDIAN, you'll be part of a diverse, supportive environment where everyone is inspired to do their best work and make a lasting impact on the world.

The ideal candidate will bring 8+ years of experience in AI/ML and HPC workloads, strong programming skills in languages like Python and Go, and expertise in distributed training systems. This role offers the opportunity to work with leading researchers and engineers while helping to shape the future of AI/ML technology at one of the world's most innovative companies.

Last updated 5 hours ago

Responsibilities For AI and ML Infra Software Engineer, GPU Clusters

  • Collaborate with AI and ML research teams to understand infrastructure needs and implement improvements
  • Monitor and optimize infrastructure performance for high availability and scalability
  • Define and improve AI researcher efficiency metrics
  • Collaborate with researchers, data engineers, and DevOps professionals
  • Stay current with latest AI/ML technologies and promote their implementation

Requirements For AI and ML Infra Software Engineer, GPU Clusters

Python
Kubernetes
Go
  • BS or equivalent in Computer Science or related field with 8+ years of experience in AI/ML and HPC workloads
  • Experience with HPC infrastructure, GPU computing, storage systems, and container technologies
  • Expertise in distributed training workloads using PyTorch, NeMo, or JAX
  • Proficiency in Python, Go, Bash, and cloud computing platforms
  • Strong communication and collaboration skills
  • Deep understanding of AI/ML workflows and pipelines

Benefits For AI and ML Infra Software Engineer, GPU Clusters

Medical Insurance
Equity
  • Competitive base salary
  • Equity compensation
  • Comprehensive benefits package

Related Jobs

Senior DFX Software Engineer - Machine Learning

Senior DFX Software Engineer role at NVIDIA focusing on machine learning applications in silicon testing and failure analysis, offering competitive compensation and the opportunity to work with cutting-edge AI technology.

Senior Solution Engineer, AI Enterprise

Senior Solution Engineer position at NVIDIA focusing on AI Enterprise solutions, requiring 8+ years of experience in AI/ML, offering competitive compensation and opportunity to work with cutting-edge technology.

Senior Deep Learning Software Engineer, Inference

Senior Deep Learning Software Engineer position at NVIDIA focusing on inference optimization and high-performance computing for AI applications.

Senior Computer Vision System Performance Engineer

Senior Computer Vision System Performance Engineer role at NVIDIA, focusing on optimizing computer vision applications with Python and CUDA, offering competitive compensation and hybrid work arrangement.

Senior Deep Learning Software Engineer, Inference

Senior Deep Learning Software Engineer position at NVIDIA focusing on inference optimization and implementation of AI models, offering competitive compensation and opportunity to work with cutting-edge technology.