NVIDIA is seeking an experienced Principal Deep Learning Engineer to join their team focusing on analyzing and improving the performance of LLM inference. This role is at the forefront of NVIDIA's rapidly growing research and development in Deep Learning Inference. The position involves working with GPU-accelerated Deep Learning software like TensorRT, developing DL benchmarking software, and creating performant solutions for model deployment and serving.
The role requires collaboration with the deep learning community to implement cutting-edge algorithms in TensorRT LLM, VLLM, SGLang, and LLM benchmarks. You'll be responsible for identifying performance opportunities and optimizing state-of-the-art LLM models across NVIDIA's range of accelerators, from datacenter GPUs to edge SoCs. The work involves implementing LLM inference, serving, and deployment algorithms using various frameworks and CUDA kernels.
As a Principal Engineer, you'll work with diverse teams in performance modeling, analysis, kernel development, and inference software development. The position offers the opportunity to contribute to NVIDIA's mission of advancing AI computing, working with the technology that powers breakthroughs in LLM, Generative AI, Recommenders, and Vision applications.
The role comes with competitive compensation, including a base salary range of $272,000 - $425,500 USD, plus equity and benefits. This hybrid position is based in Santa Clara, CA, offering the flexibility of both office and remote work. Join NVIDIA in shaping the future of AI computing and be part of a team that's transforming industries through innovative deep learning solutions.