Senior AI Networking Performance Research and Analysis Engineer

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.

Shanghai, China • Guangzhou, Guangdong Province, China • Beijing, China…

Machine Learning

Senior Software Engineer

Hybrid

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Description For Senior AI Networking Performance Research and Analysis Engineer

NVIDIA is seeking a Senior High Performance Computing (HPC) and AI Networking Performance Research and Analysis Engineer to join their Performance group. This role sits at the intersection of AI, distributed systems, and high-performance computing, focusing on optimizing networking performance for large-scale deep learning and LLM training.

The position offers an opportunity to work with cutting-edge technology in AI and GPU computing, profiling and analyzing AI workloads on large-scale GPU clusters. You'll be working with various hardware platforms including HCAs, Switches, CPUs, and GPUs, developing performance analysis tools and methodologies to understand and optimize system performance.

As part of NVIDIA, you'll be joining a company at the forefront of AI and accelerated computing innovation. NVIDIA has been redefining computer graphics and computing for over 25 years and is now leading the charge in AI development. The company offers competitive salaries and comprehensive benefits in a diverse, supportive environment.

The ideal candidate will bring strong expertise in high-performance networking, deep learning frameworks, and performance analysis. You'll need to demonstrate proficiency in Python, Bash, and C languages, along with experience in Linux environments. Knowledge of CUDA, NCCL libraries, and congestion control algorithms would be particularly valuable.

This role offers the chance to make a significant impact on the future of AI computing, working with some of the most advanced technology in the field. You'll be part of a team pushing the boundaries of what's possible in distributed deep learning and high-performance computing, while contributing to NVIDIA's mission of solving challenges no one else can solve.

Last updated a day ago

Responsibilities For Senior AI Networking Performance Research and Analysis Engineer

Profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training
Explore and research AI workloads and DL models for large-scale deep learning LLM training
Benchmark, profile, and analyze performance to find bottlenecks
Implement performance analysis tools
Collaborate with hardware and software teams
Define performance test planning and set performance expectations

Requirements For Senior AI Networking Performance Research and Analysis Engineer

Python

Linux

Kubernetes

B.Sc in Computer Science or Software Engineering or equivalent experience
5+ years of experience with high-performance Networking (RDMA, MPI, NCCL, Congestion Control Algorithms)
Demonstrated Performance Analysis skills and methodologies
Experience with NVIDIA GPUs, CUDA library, deep learning frameworks
Fast and self-learning capabilities with strong analytical skills
Programming Languages: Python, Bash and C languages
Experience with Linux OS distros
Great teammate with good communication skills

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.

Shanghai, China • Guangzhou, Guangdong Province, China • Beijing, China…

Machine Learning

Senior Software Engineer

Hybrid

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Interested in this job?

Senior AI Networking Performance Research and Analysis Engineer

NVIDIA

Description For Senior AI Networking Performance Research and Analysis Engineer

Responsibilities For Senior AI Networking Performance Research and Analysis Engineer

Requirements For Senior AI Networking Performance Research and Analysis Engineer

NVIDIA

Jobs Related To NVIDIA Senior AI Networking Performance Research and Analysis Engineer