NVIDIA, the world leader in accelerated computing, is seeking an AI/ML Performance Engineer to drive the development of next-generation inference optimizations. This role sits at the intersection of artificial intelligence advancement and system performance optimization, focusing on developing scalable inference strategies and cross-stack optimizations. The position involves working with cutting-edge AI technologies, including attention mechanisms, speculative decoding, and system-level techniques for model deployment.
The role requires collaboration across multiple teams, including deep learning research, framework development, compiler engineering, and silicon architecture. The successful candidate will be responsible for developing performance models, designing optimizations for inference deployment, and quantifying performance benefits to guide future software and hardware roadmaps. This position offers an opportunity to shape the future of datacenter technology and AI infrastructure at one of technology's most innovative companies.
The ideal candidate should possess strong technical skills in computer architecture and AI/ML systems, with experience in performance analysis and optimization. They should be comfortable with Python programming and have a solid understanding of LLM internals. The role offers competitive compensation, including a base salary range of $148,000 - $287,500, equity, and additional benefits.
NVIDIA's commitment to diversity and inclusion, combined with its position at the forefront of AI innovation, makes this an excellent opportunity for someone looking to make a significant impact in the field of AI and machine learning optimization. The role offers exposure to the latest developments in AI technology while working with some of the industry's most forward-thinking professionals.