NVIDIA is seeking an experienced Principal Deep Learning Engineer to join their team, focusing on analyzing and improving the performance of LLM inference. This role sits at the intersection of deep learning and high-performance computing, working with cutting-edge LLM technologies and NVIDIA's GPU platforms.
The position involves working with state-of-the-art large language models and implementing performance optimizations across NVIDIA's range of accelerators, from datacenter GPUs to edge SoCs. You'll be collaborating with the deep learning community to implement the latest algorithms for public release in TensorRT LLM, VLLM, SGLang, and LLM benchmarks.
As a Principal Engineer, you'll be responsible for scaling performance across different architectures, optimizing for maximum throughput and minimum latency, and contributing to both NVIDIA and open-source LLM frameworks. The role requires deep expertise in Python/C/C++ programming and experience with deep learning frameworks like PyTorch, JAX, or TensorFlow.
NVIDIA's position as the "AI computing company" makes this an exciting opportunity to work on technology that's transforming industries. You'll be part of the team that enables the performance optimization, deployment, and serving of deep learning solutions used by companies worldwide.
The role offers competitive compensation with a base salary range of $272,000 - $425,500 USD, plus equity and benefits. Working in a hybrid environment, you'll collaborate with diverse teams across generative AI, automotive, image understanding, and speech understanding to develop innovative solutions.
This is an opportunity to work at the forefront of AI technology, specifically in the rapidly growing field of large language models, while contributing to software that powers breakthroughs in areas like Generative AI, Recommenders, and Computer Vision.