NVIDIA, the world leader in accelerated computing, is seeking an AI Computing Performance Architect to join their team in Shanghai. This role focuses on developing and optimizing critical components for NVIDIA's next-generation AI architectures, particularly in the realm of Large Language Models (LLMs). The position involves working with cutting-edge technology in deep learning hardware and software optimization.
As an AI Computing Performance Architect, you'll be responsible for designing and optimizing major LLM components, including attention mechanisms and GEMM operations. You'll work directly with NVIDIA's latest GPU architectures, conducting detailed performance analysis and optimization of kernel operations. This role requires a deep understanding of GPU programming, particularly CUDA, and the ability to identify and resolve performance bottlenecks.
The ideal candidate brings 4+ years of industry experience in GPU programming or deep learning optimization, with a proven track record of improving kernel performance. Knowledge of LLM architectures, particularly FMHA and GEMM operations, is highly valued. This position offers the opportunity to shape the future of AI computing infrastructure at one of the industry's leading companies.
Working at NVIDIA means being at the forefront of AI innovation, collaborating with talented teams across architecture, software, and product development. You'll have the chance to make a significant impact on the performance and efficiency of next-generation AI systems, contributing to technologies that are transforming industries worldwide.