We are seeking senior engineers with a focus on performance analysis and optimization to help maximize the efficiency of Deep Learning training, inference, and NVIDIA AI Services. Our work spans all layers of the hardware/software stack, from GPU architecture to Deep Learning Framework, aiming to achieve peak performance. This role offers a unique opportunity to directly impact the hardware and software roadmap in a rapidly growing company at the forefront of the AI revolution.
Join our team building software used globally, working alongside world-class engineers to implement blazingly fast state-of-the-art deep learning models. You'll contribute to understanding the end-to-end performance of NVIDIA's DL software and hardware stack, working on the most powerful, enterprise-grade GPU clusters capable of hundreds of Peta FLOPS, and on unreleased hardware before anyone else in the world.
Key Responsibilities:
- Implement deep learning models across multiple data domains (CV, NLP/LLMs, ASR, TTS, RecSys, etc.) using various DL frameworks (PyTorch, JAX, TensorFlow 2, DGL, etc.)
- Develop and test new software features (e.g., Graph Compilation, reduced precision training) leveraging the latest hardware functionalities
- Analyze, profile, and optimize deep learning workloads on cutting-edge hardware and software platforms
- Collaborate with researchers and engineers across NVIDIA, providing guidance on improving workload design, usability, and performance
- Lead best practices for building, testing, and releasing DL software
Requirements:
- 5+ years of experience in DL model implementation and software development
- BSc, MS, or PhD in Computer Science, Computer Architecture, Mathematics, Physics, or related technical field (or equivalent experience)
- Excellent Python programming skills and extensive knowledge of at least one DL Framework
- Strong problem-solving and analytical skills
- Solid understanding of algorithms and DL fundamentals
Preferred Qualifications:
- Experience in performance measurements and profiling
- Experience running large-scale workloads in HPC clusters
- Knowledge of DevOps/MLOps practices for Deep Learning-based product development
- Solid understanding of Linux environments and containerization technologies (e.g., Docker)
- GPU programming experience (CUDA or OpenCL) is a plus but not required
NVIDIA is an equal opportunity employer valuing diversity and does not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.