NVIDIA is seeking a Senior Performance Software Engineer to join their Deep Learning Libraries team. This role is perfect for engineers passionate about parallel computing and performance optimization. You'll be at the forefront of AI acceleration, working on NVIDIA's cutting-edge GPU technologies and contributing to libraries like cuDNN, cuBLAS, and TensorRT.
The position involves developing highly optimized code for accelerating linear algebra and deep learning operations on NVIDIA GPUs. You'll be working directly with core technologies that enable breakthroughs in image classification, speech recognition, and natural language processing. The team takes pride in delivering high-performance code that powers the AI revolution globally.
As a member of this team, you'll be writing performance-critical compute kernels, collaborating with various teams across NVIDIA, and working on code that operates close to the GPU hardware level. The role offers exposure to NVIDIA's innovative Tensor Cores and involves working with projects like CUTLASS, their open-source matrix multiplication library.
The ideal candidate should have strong C++ programming skills, experience with parallel programming, and a deep understanding of computer architecture. You'll be working in a collaborative environment, interfacing with compiler teams, performance teams, and hardware architects to push the boundaries of GPU efficiency.
This is an exceptional opportunity for someone who wants to make a significant impact in the field of AI and deep learning, working with state-of-the-art technology at a company that's leading the acceleration computing revolution. The role offers the chance to work on fundamental software that powers AI applications worldwide, making it an exciting position for those passionate about performance engineering and deep learning.