NVIDIA is seeking a Senior Deep Learning Software Engineer specializing in inference to join their growing team. This role focuses on designing, building, and optimizing GPU-accelerated software that powers sophisticated AI applications. The position involves working with cutting-edge deep learning frameworks like SGLang and vLLM, which are crucial for efficient large-scale model serving and inference.
The role requires expertise in performance optimization of deep learning models, particularly in LLM and Generative AI domains. You'll be working with state-of-the-art technologies including CUTLASS, OAI Triton, NCCL, and CUDA kernels to implement and optimize model serving pipelines. The position offers an opportunity to contribute to public frameworks and drive performance improvements across NVIDIA's range of accelerators.
As part of NVIDIA, widely recognized as one of technology's most desirable employers, you'll join a team of forward-thinking professionals working on breakthrough technologies. The company offers highly competitive compensation, including a base salary range of $148,000 - $287,500 (depending on level), equity, and comprehensive benefits.
The ideal candidate should have 5+ years of software development experience, strong C/C++ programming skills, and preferably experience with GPU programming and deep learning model optimization. A Masters or PhD in Computer Science, Computer Engineering, or related field is required. This is an excellent opportunity for someone passionate about AI and high-performance computing to make significant contributions to the field of deep learning inference.