We are seeking senior engineers with a focus on performance analysis and optimization to help maximize the efficiency of Deep Learning training, inference, and NVIDIA AI Services. Our work spans all layers of the hardware/software stack, from GPU architecture to Deep Learning Framework, aiming to achieve peak performance. This role offers a unique opportunity to directly impact the hardware and software roadmap in a rapidly growing company at the forefront of the AI revolution.

Join our team building software used globally, working alongside world-class engineers to implement blazingly fast state-of-the-art deep learning models. You'll contribute to understanding the end-to-end performance of NVIDIA's DL software and hardware stack, working on the most powerful, enterprise-grade GPU clusters capable of hundreds of Peta FLOPS, and on unreleased hardware before anyone else in the world.

Key Responsibilities:

Implement deep learning models across multiple data domains (CV, NLP/LLMs, ASR, TTS, RecSys, etc.) using various DL frameworks (PyTorch, JAX, TensorFlow 2, DGL, etc.)
Develop and test new software features (e.g., Graph Compilation, reduced precision training) leveraging the latest hardware functionalities
Analyze, profile, and optimize deep learning workloads on cutting-edge hardware and software platforms
Collaborate with researchers and engineers across NVIDIA, providing guidance on improving workload design, usability, and performance
Lead best practices for building, testing, and releasing DL software

Requirements:

5+ years of experience in DL model implementation and software development
BSc, MS, or PhD in Computer Science, Computer Architecture, Mathematics, Physics, or related technical field (or equivalent experience)
Excellent Python programming skills and extensive knowledge of at least one DL Framework
Strong problem-solving and analytical skills
Solid understanding of algorithms and DL fundamentals

Preferred Qualifications:

Experience in performance measurements and profiling
Experience running large-scale workloads in HPC clusters
Knowledge of DevOps/MLOps practices for Deep Learning-based product development
Solid understanding of Linux environments and containerization technologies (e.g., Docker)
GPU programming experience (CUDA or OpenCL) is a plus but not required

NVIDIA is an equal opportunity employer valuing diversity and does not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Senior Deep Learning Performance Engineer - Training at Scale

NVIDIA

Description For Senior Deep Learning Performance Engineer - Training at Scale

Responsibilities For Senior Deep Learning Performance Engineer - Training at Scale

Requirements For Senior Deep Learning Performance Engineer - Training at Scale

NVIDIA