Taro Logo

Senior High-Performance LLM Training Engineer

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.
$184,000 - $356,500
Machine Learning
Senior Software Engineer
Hybrid
8+ years of experience
AI
This job posting is no longer active. 😔

Job Description

NVIDIA is seeking a Senior High-Performance LLM Training Engineer to join their team in shaping the world's most advanced computing systems. This role focuses on optimizing NVIDIA's high-performance LLM software stack in frameworks like PyTorch and JAX for training on thousands of GPUs, while also influencing hardware roadmaps for next-generation GPUs powering the AI revolution.

The position offers an opportunity to work at the intersection of deep learning and high-performance computing, optimizing training workloads for maximum efficiency. You'll be working with state-of-the-art neural networks and implementing solutions across NVIDIA's entire deep learning platform stack, from drivers to frameworks.

NVIDIA, widely regarded as one of tech's most desirable employers, offers competitive compensation and comprehensive benefits. The role provides the unique opportunity to collaborate with forward-thinking professionals in shaping the future of AI. The work environment encourages innovation and creativity, with the freedom to work autonomously on challenging problems.

The company is at the forefront of GPU computing, which serves as the foundation for deep learning and AI advancement. NVIDIA's technology powers everything from data centers to edge devices, including self-driving cars and autonomous robots. Their work has the potential to drive unprecedented social progress, comparable to the industrial revolution.

This role is perfect for someone passionate about performance optimization, with a deep understanding of both hardware and software aspects of AI systems. You'll be working in a hybrid environment, contributing to groundbreaking developments in AI while enjoying the benefits of working for a leading technology innovator.

Last updated 9 months ago

Responsibilities For Senior High-Performance LLM Training Engineer

  • Understand, analyze, profile, and optimize AI training workloads on innovative hardware and software platforms
  • Understand the big picture of training performance on GPUs, prioritizing and solving problems across all state-of-the-art neural networks
  • Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack
  • Build and support NVIDIA submissions to the MLPerf Training benchmark suite
  • Implement key DL training workloads in NVIDIA's proprietary processor and system simulators
  • Build tools to automate workload analysis, workload optimization, and other critical workflows

Requirements For Senior High-Performance LLM Training Engineer

Python
  • PhD in Computer Science, Electrical Engineering or Computer Engineering and 5+ years; or MS (or equivalent experience) and 8+ years of meaningful work experience
  • Strong background in deep learning and neural networks, in particular training
  • Deep background in computer architecture and familiarity with GPU architecture fundamentals
  • Proven experience analyzing and tuning application performance & processor and system-level performance modelling
  • Programming skills in C++, Python, and CUDA

Benefits For Senior High-Performance LLM Training Engineer

Equity
  • Equity