AI Computing Performance Architect, Kernel Dev and Perf Analysis

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.

Shanghai, China

Machine Learning

Staff Software Engineer

In-Person

5,000+ Employees

4+ years of experience

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For AI Computing Performance Architect, Kernel Dev and Perf Analysis

NVIDIA, the world leader in accelerated computing, is seeking an AI Computing Performance Architect to join their team in Shanghai. This role focuses on developing and optimizing critical components for NVIDIA's next-generation AI architectures, particularly in the realm of Large Language Models (LLMs). The position involves working with cutting-edge technology in deep learning hardware and software optimization.

As an AI Computing Performance Architect, you'll be responsible for designing and optimizing major LLM components, including attention mechanisms and GEMM operations. You'll work directly with NVIDIA's latest GPU architectures, conducting detailed performance analysis and optimization of kernel operations. This role requires a deep understanding of GPU programming, particularly CUDA, and the ability to identify and resolve performance bottlenecks.

The ideal candidate brings 4+ years of industry experience in GPU programming or deep learning optimization, with a proven track record of improving kernel performance. Knowledge of LLM architectures, particularly FMHA and GEMM operations, is highly valued. This position offers the opportunity to shape the future of AI computing infrastructure at one of the industry's leading companies.

Working at NVIDIA means being at the forefront of AI innovation, collaborating with talented teams across architecture, software, and product development. You'll have the chance to make a significant impact on the performance and efficiency of next-generation AI systems, contributing to technologies that are transforming industries worldwide.

Last updated a day ago

Responsibilities For AI Computing Performance Architect, Kernel Dev and Perf Analysis

Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures
Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs
Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations
Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency
Create and maintain workloads and micro-benchmark suites
Generate performance projections, comparisons, and detailed analysis reports
Collaborate with architecture, software, and product teams

Requirements For AI Computing Performance Architect, Kernel Dev and Perf Analysis

4+ years of industry experience in GPU programming or performance optimization for DL applications
Demonstrated experience in analyzing and improving the performance of GPU kernels
LLM FMHA or GEMM related development or optimization experience
Expertise in CUDA programming for GPU acceleration
Expertise in GPU/CPU Core or MemSys architecture modeling
Excellent communication skills, both written and verbal
Strong organizational and time management abilities