NVIDIA is seeking a Senior Deep Learning Software Engineer specializing in inference optimization to join their growing team. This role sits at the intersection of cutting-edge AI technology and high-performance computing, where you'll be instrumental in developing and optimizing GPU-accelerated software that powers sophisticated AI applications.
The position involves working with advanced deep learning frameworks like SGLang and vLLM, which are crucial for efficient large-scale model serving and inference. You'll be responsible for implementing and optimizing state-of-the-art LLM and Generative AI models across NVIDIA's range of accelerators, from datacenter GPUs to edge SoCs.
As a senior engineer, you'll collaborate with the deep learning community to implement cutting-edge algorithms and drive performance improvements. The role requires expertise in C/C++ programming, deep learning frameworks, and GPU optimization techniques. You'll work with tools like CUTLASS, OAI Triton, NCCL, and CUDA kernels to build and optimize model serving pipelines.
The position offers a competitive salary range of $148,000 to $287,500 USD, along with equity and comprehensive benefits. NVIDIA is known for being one of the technology world's most desirable employers, offering opportunities to work on groundbreaking AI technologies that transform industries.
This role is perfect for someone with 5+ years of relevant experience, strong programming skills, and a deep understanding of AI/ML technologies. You'll be joining a forward-thinking team at NVIDIA's Santa Clara location, where you'll have the opportunity to impact the future of AI acceleration and inference optimization.
The ideal candidate will have a Masters or PhD in a relevant field, experience with deep learning model optimization, and a track record of contributing to significant software projects. Experience with multi-GPU communications and performance optimization would be particularly valuable.