Senior Research Engineer, Foundation Model Training Infrastructure

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$224,000 - $356,500
Machine Learning
Staff Software Engineer
In-Person
5,000+ Employees
10+ years of experience
AI · Robotics

Description For Senior Research Engineer, Foundation Model Training Infrastructure

NVIDIA is seeking a senior or principal engineer to join their Generalist Embodied Agent Research (GEAR) group, focusing on Project GR00T - their ambitious initiative for developing foundation models and technology for humanoid robots. This role combines cutting-edge AI infrastructure development with robotics research, working alongside a team that has produced influential works like Eureka, VIMA, Voyager, and MineDojo.

The position requires deep expertise in distributed systems and machine learning infrastructure, with responsibilities spanning from designing large-scale training systems to optimizing GPU cluster performance. The ideal candidate will have extensive experience (10+ years) in MLOps and AI infrastructure, with strong programming skills in Python and C++, and deep knowledge of GPU acceleration and cluster management.

NVIDIA, recognized as one of technology's most desirable employers, offers a competitive compensation package including a base salary range of $224,000 - $356,500 USD, plus equity and comprehensive benefits. The role is based in Santa Clara, CA, placing you at the heart of NVIDIA's innovation center.

This is a unique opportunity to contribute to groundbreaking research in robotics and AI, working with state-of-the-art technology and infrastructure. The position combines technical leadership with hands-on development, requiring both deep technical expertise and the ability to collaborate with researchers to implement cutting-edge architectures.

Join NVIDIA's mission to advance general-purpose robots and large-scale foundation models, working with some of the most forward-thinking professionals in the industry. The company's commitment to diversity and inclusion, coupled with their position at the forefront of AI and accelerated computing, makes this an exceptional opportunity for experienced engineers looking to make a significant impact in the field of AI and robotics.

Last updated 4 hours ago

Responsibilities For Senior Research Engineer, Foundation Model Training Infrastructure

  • Design and maintain large-scale distributed training systems for multi-modal foundation models for robotics
  • Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets
  • Implement scalable data loaders and preprocessors for multimodal datasets
  • Develop robust monitoring and debugging tools for training workflows on large GPU clusters
  • Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines

Requirements For Senior Research Engineer, Foundation Model Training Infrastructure

Python
Kubernetes
  • Bachelor's degree in Computer Science, Robotics, Engineering, or related field
  • 10+ years of full-time industry experience in large-scale MLOps and AI infrastructure
  • Experience designing and optimizing distributed training systems with PyTorch, JAX, or TensorFlow
  • Deep understanding of GPU acceleration, CUDA programming, and cluster management tools
  • Strong programming skills in Python and C++
  • Strong experience with large-scale GPU clusters, HPC environments, and job scheduling tools

Benefits For Senior Research Engineer, Foundation Model Training Infrastructure

Equity
  • Competitive base salary
  • Equity compensation
  • Comprehensive benefits package

Interested in this job?

Jobs Related To NVIDIA Senior Research Engineer, Foundation Model Training Infrastructure

Director, AI Software

Lead AI software development and team building for NVIDIA's Metropolis manufacturing platform, driving innovation in computer vision and data analytics.

Senior Deep Learning Performance Architect

Senior Deep Learning Performance Architect role at NVIDIA focusing on developing advanced processor architectures for AI acceleration, offering competitive compensation and the chance to shape the future of machine learning.

Senior Deep Learning Performance Architect

Senior Deep Learning Performance Architect position at NVIDIA, developing next-generation AI architectures with competitive compensation and opportunity to work on cutting-edge technology.

Deep Learning Engineer - Distributed Task-Based Backends

Senior to Principal Deep Learning Engineer role at NVIDIA focusing on distributed backends for AI frameworks, offering $148K-$287.5K salary plus equity, with remote work options.

Staff Software Engineer, Capacity Engineering

Staff Software Engineer position at Pinterest focusing on ML infrastructure optimization and capacity management, offering competitive compensation and hybrid work model.