Senior Performance Research and Analysis Engineer

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins transforming major industries.

London, UK

Senior Software Engineer

Remote

5+ years of experience

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Senior Performance Research and Analysis Engineer

NVIDIA, the world leader in accelerated computing, is seeking a Senior Performance Research and Analysis Engineer to join their Performance group. This role focuses on profiling and analyzing AI workloads on large-scale GPU and CPU clusters, specifically for distributed Deep Learning LLM training.

The position offers a unique opportunity to work with cutting-edge hardware and platforms, including HCAs, Switches, CPUs, GPUs, and Systems. You'll be at the forefront of performance optimization for AI systems, developing and implementing analysis tools and methodologies to understand performance expectations, limitations, and bottlenecks.

Key responsibilities include researching AI workloads and DL models for large-scale training, conducting comprehensive performance analysis, and collaborating across hardware and software teams. The role requires expertise in high-performance networking, with a focus on RDMA and MPI, along with strong programming skills in Python, Bash, and C.

The ideal candidate will have at least 5 years of experience in high-performance networking, a strong background in computer science or software engineering, and demonstrated expertise with NVIDIA GPUs and deep learning frameworks. This position offers the opportunity to work with state-of-the-art technology and contribute to advancing AI computing performance.

NVIDIA provides a diverse and inclusive work environment, offering the chance to work on transformative technology that impacts major industries worldwide. The remote work option across multiple European locations provides flexibility while working with global teams on cutting-edge AI and computing challenges.

Last updated 7 months ago

Responsibilities For Senior Performance Research and Analysis Engineer

Profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training
Research AI workloads and DL models for large-scale deep learning LLM training
Benchmark, profile, and analyze performance to find bottlenecks
Implement performance analysis tools
Collaborate with hardware and software teams
Define performance test planning and set performance expectations

Requirements For Senior Performance Research and Analysis Engineer

Python

Linux

B.Sc in Computer Science or Software Engineering
5+ years of experience with high-performance Networking (RDMA, MPI)
Performance Analysis skills and methodologies
Experience with NVIDIA GPUs, CUDA library, deep learning frameworks
Fast and self-learning capabilities with strong analytical skills
Programming Languages: Python, Bash and C languages
Experience with Linux OS distros
Team player with good communication skills