Senior Performance Software Engineer, Deep Learning Libraries

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions for AI and digital twins.

Shanghai, China • Beijing, China

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

2+ years of experience

Job Description

NVIDIA is seeking a Senior Performance Software Engineer to join their Deep Learning Libraries team. This role is perfect for engineers passionate about parallel computing and performance optimization. You'll be at the forefront of AI acceleration, working on NVIDIA's cutting-edge GPU technologies and contributing to libraries like cuDNN, cuBLAS, and TensorRT.

The position involves developing highly optimized code for accelerating linear algebra and deep learning operations on NVIDIA GPUs. You'll be working directly with core technologies that enable breakthroughs in image classification, speech recognition, and natural language processing. The team takes pride in delivering high-performance code that powers the AI revolution globally.

As a member of this team, you'll be writing performance-critical compute kernels, collaborating with various teams across NVIDIA, and working on code that operates close to the GPU hardware level. The role offers exposure to NVIDIA's innovative Tensor Cores and involves working with projects like CUTLASS, their open-source matrix multiplication library.

The ideal candidate should have strong C++ programming skills, experience with parallel programming, and a deep understanding of computer architecture. You'll be working in a collaborative environment, interfacing with compiler teams, performance teams, and hardware architects to push the boundaries of GPU efficiency.

This is an exceptional opportunity for someone who wants to make a significant impact in the field of AI and deep learning, working with state-of-the-art technology at a company that's leading the acceleration computing revolution. The role offers the chance to work on fundamental software that powers AI applications worldwide, making it an exciting position for those passionate about performance engineering and deep learning.

Last updated 16 days ago

Responsibilities For Senior Performance Software Engineer, Deep Learning Libraries

Writing highly tuned compute kernels for core deep learning operations
Following software engineering best practices including regression testing and CI/CD flows
Collaborating with CUDA compiler team on optimal assembly code
Working with deep learning training and inference performance teams
Collaborating with hardware and architecture teams on programming models

Requirements For Senior Performance Software Engineer, Deep Learning Libraries

Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field
2+ years of relevant industry experience
Strong C++ programming and software design skills
Experience with performance-oriented parallel programming
Solid understanding of computer architecture and assembly programming
Ability to identify bottlenecks, optimize resource utilization, and improve throughput