NVIDIA is seeking a Senior Performance Analysis Engineer to join their Performance group, focusing on profiling and analyzing AI workloads on large GPU and CPU scale clusters for distributed Deep Learning LLM training. This role sits at the intersection of high-performance computing and artificial intelligence, working with cutting-edge technology including GPUs, CPUs, and networking systems.
The position involves deep technical work with NVIDIA's supercomputers and distributed systems, with a particular focus on high-performance networking and the NVIDIA Collective Communications Library (NCCL). You'll be responsible for benchmarking, profiling, and analyzing performance to optimize large-scale AI systems, while developing new tools and methodologies for performance analysis.
As a Senior Performance Analysis Engineer, you'll collaborate across multiple teams, from hardware to software, providing crucial insights that drive performance improvements. The role requires extensive experience with high-performance networking protocols and technologies, combined with a strong understanding of GPU computing and deep learning frameworks.
NVIDIA offers a unique opportunity to work at the forefront of AI and accelerated computing, with access to the latest technology and the chance to make a significant impact on the future of computing. The company provides competitive salaries and comprehensive benefits, fostering an environment where innovation and technical excellence are highly valued.
The ideal candidate will bring a combination of technical expertise in performance analysis, networking protocols, and AI systems, along with strong analytical and problem-solving skills. This role offers the opportunity to work on challenging problems at scale, contributing to NVIDIA's mission of advancing the field of artificial intelligence and accelerated computing.