NVIDIA, the pioneer in accelerated computing and AI technology, is seeking a Senior Software Engineer specializing in Deep Learning Inference. This role sits at the intersection of cutting-edge AI development and performance optimization, working with the latest generative AI models. The position involves building software solutions for efficient inference, from server-level request batching to GPU kernel fusion. You'll be working with NVIDIA's opensource AI runtimes, including Triton Inference Server and TensorRT-LLM, optimizing inference workloads, and implementing low-level GPU code. The ideal candidate combines strong software engineering principles with deep machine learning knowledge and performance optimization expertise. You'll collaborate with global teams, contributing to production-grade products that push the boundaries of AI acceleration. NVIDIA offers the opportunity to work with some of the most forward-thinking professionals in technology, in an environment that values creativity, autonomy, and diversity. This role is perfect for someone passionate about both software engineering excellence and advancing the state of AI technology.