Taro Logo

Senior AI Networking Performance Research and Analysis Engineer

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
Shanghai, ChinaGuangzhou, Guangdong Province, ChinaBeijing, China
Machine Learning
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior AI Networking Performance Research and Analysis Engineer

NVIDIA is seeking a Senior High Performance Computing (HPC) and AI Networking Performance Research and Analysis Engineer to join their Performance group. This role sits at the intersection of AI, distributed systems, and high-performance computing, focusing on optimizing networking performance for large-scale deep learning and LLM training.

The position offers an opportunity to work with cutting-edge technology in AI and GPU computing, profiling and analyzing AI workloads on large-scale GPU clusters. You'll be working with various hardware platforms including HCAs, Switches, CPUs, and GPUs, developing performance analysis tools and methodologies to understand and optimize system performance.

As part of NVIDIA, you'll be joining a company at the forefront of AI and accelerated computing innovation. NVIDIA has been redefining computer graphics and computing for over 25 years and is now leading the charge in AI development. The company offers competitive salaries and comprehensive benefits in a diverse, supportive environment.

The ideal candidate will bring strong expertise in high-performance networking, deep learning frameworks, and performance analysis. You'll need to demonstrate proficiency in Python, Bash, and C languages, along with experience in Linux environments. Knowledge of CUDA, NCCL libraries, and congestion control algorithms would be particularly valuable.

This role offers the chance to make a significant impact on the future of AI computing, working with some of the most advanced technology in the field. You'll be part of a team pushing the boundaries of what's possible in distributed deep learning and high-performance computing, while contributing to NVIDIA's mission of solving challenges no one else can solve.

Last updated a day ago

Responsibilities For Senior AI Networking Performance Research and Analysis Engineer

  • Profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training
  • Explore and research AI workloads and DL models for large-scale deep learning LLM training
  • Benchmark, profile, and analyze performance to find bottlenecks
  • Implement performance analysis tools
  • Collaborate with hardware and software teams
  • Define performance test planning and set performance expectations

Requirements For Senior AI Networking Performance Research and Analysis Engineer

Python
Linux
Kubernetes
  • B.Sc in Computer Science or Software Engineering or equivalent experience
  • 5+ years of experience with high-performance Networking (RDMA, MPI, NCCL, Congestion Control Algorithms)
  • Demonstrated Performance Analysis skills and methodologies
  • Experience with NVIDIA GPUs, CUDA library, deep learning frameworks
  • Fast and self-learning capabilities with strong analytical skills
  • Programming Languages: Python, Bash and C languages
  • Experience with Linux OS distros
  • Great teammate with good communication skills

Interested in this job?

Jobs Related To NVIDIA Senior AI Networking Performance Research and Analysis Engineer