Taro Logo

AI Computing Performance Architect, Kernel Dev and Perf Analysis

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.
Machine Learning
Staff Software Engineer
In-Person
5,000+ Employees
4+ years of experience
AI
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For AI Computing Performance Architect, Kernel Dev and Perf Analysis

NVIDIA, the world leader in accelerated computing, is seeking an AI Computing Performance Architect to join their team in Shanghai. This role focuses on developing and optimizing critical components for NVIDIA's next-generation AI architectures, particularly in the realm of Large Language Models (LLMs). The position involves working with cutting-edge technology in deep learning hardware and software optimization.

As an AI Computing Performance Architect, you'll be responsible for designing and optimizing major LLM components, including attention mechanisms and GEMM operations. You'll work directly with NVIDIA's latest GPU architectures, conducting detailed performance analysis and optimization of kernel operations. This role requires a deep understanding of GPU programming, particularly CUDA, and the ability to identify and resolve performance bottlenecks.

The ideal candidate brings 4+ years of industry experience in GPU programming or deep learning optimization, with a proven track record of improving kernel performance. Knowledge of LLM architectures, particularly FMHA and GEMM operations, is highly valued. This position offers the opportunity to shape the future of AI computing infrastructure at one of the industry's leading companies.

Working at NVIDIA means being at the forefront of AI innovation, collaborating with talented teams across architecture, software, and product development. You'll have the chance to make a significant impact on the performance and efficiency of next-generation AI systems, contributing to technologies that are transforming industries worldwide.

Last updated a day ago

Responsibilities For AI Computing Performance Architect, Kernel Dev and Perf Analysis

  • Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures
  • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs
  • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations
  • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency
  • Create and maintain workloads and micro-benchmark suites
  • Generate performance projections, comparisons, and detailed analysis reports
  • Collaborate with architecture, software, and product teams

Requirements For AI Computing Performance Architect, Kernel Dev and Perf Analysis

  • 4+ years of industry experience in GPU programming or performance optimization for DL applications
  • Demonstrated experience in analyzing and improving the performance of GPU kernels
  • LLM FMHA or GEMM related development or optimization experience
  • Expertise in CUDA programming for GPU acceleration
  • Expertise in GPU/CPU Core or MemSys architecture modeling
  • Excellent communication skills, both written and verbal
  • Strong organizational and time management abilities

Interested in this job?