NVIDIA is seeking a Senior Performance Software Engineer to join their Deep Learning Libraries team. This role focuses on developing optimized code to accelerate linear algebra and deep learning operations on NVIDIA GPUs. The position involves working with cutting-edge technologies like cuDNN, cuBLAS, and TensorRT libraries to accelerate deep learning models. The successful candidate will be instrumental in enabling breakthroughs in image classification, speech recognition, and natural language processing.
The role requires expertise in writing highly optimized compute kernels using C++ CUDA, with a focus on core deep learning operations such as matrix multiplies, convolutions, and normalizations. You'll be working at the lower levels of the deep learning software stack, directly interfacing with GPU hardware. Collaboration is key, as you'll work across multiple NVIDIA teams including the CUDA compiler team, deep learning performance teams, and hardware architecture teams.
This is an excellent opportunity for someone with strong parallel programming experience and a deep understanding of computer architecture. The ideal candidate should have at least 2 years of industry experience and advanced education in Computer Science or related fields. Experience with CUDA/OpenCL GPU programming, numerical methods, and linear algebra would be particularly valuable.
Join NVIDIA, the world leader in accelerated computing, and be part of a team that's driving the revolution in artificial intelligence. You'll be working on projects that directly impact the performance of AI applications worldwide, using cutting-edge technology and contributing to open-source projects like CUTLASS.