NVIDIA is seeking a Senior Performance Software Engineer to join their Deep Learning Libraries team. This role focuses on developing optimized code to accelerate linear algebra and deep learning operations on NVIDIA GPUs, contributing to crucial libraries like cuDNN, cuBLAS, and TensorRT. The position is integral to enabling breakthroughs in image classification, speech recognition, and natural language processing. The team works on low-level GPU optimization, writing highly efficient code for current and future-generation GPUs. The role involves close collaboration with various NVIDIA teams, including the CUDA compiler team, deep learning performance teams, and hardware architecture teams. The ideal candidate should have strong C++ programming skills, experience with parallel programming, and a deep understanding of computer architecture. The position offers the opportunity to work at one of technology's most desirable employers, contributing to cutting-edge developments in AI and deep learning. The role is remote-friendly, with options to work from several European locations. This is a chance to join a team that's directly impacting the future of AI computing and hardware optimization.