Taro Logo

Senior HPC and AI Networking Performance Research and Analysis Engineer

World leader in accelerated computing, pioneering AI and digital twins technology.
$148,000 - $287,500
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior HPC and AI Networking Performance Research and Analysis Engineer

NVIDIA is seeking a Senior High Performance Computing (HPC) and AI Networking Performance Research and Analysis Engineer to join their Performance group. This role focuses on profiling and analyzing AI workloads on large-scale GPU and CPU clusters for distributed Deep Learning LLM training, with emphasis on collectives communication and networking.

The position offers an exciting opportunity to work with cutting-edge technology in AI and HPC, including hands-on experience with various hardware platforms such as HCAs, Switches, CPUs, and GPUs. The role involves developing performance analysis tools and methodologies to understand performance expectations, limitations, and bottlenecks in large-scale AI systems.

NVIDIA, as the world leader in accelerated computing, provides an innovative environment where you'll be at the forefront of AI and deep learning advancement. The company has been transforming computer graphics, PC gaming, and accelerated computing for over 25 years, and is now leading the charge in AI development.

The compensation is highly competitive, with a base salary range of $148,000 - $287,500 USD depending on level and experience. The position includes equity grants and comprehensive benefits. You'll be working in a hybrid environment in Santa Clara, CA, collaborating with various teams across hardware and software domains.

This role is perfect for someone with strong technical expertise in high-performance networking, deep learning frameworks, and performance analysis, combined with excellent analytical and problem-solving skills. The ideal candidate should have at least 5 years of relevant experience and be passionate about pushing the boundaries of AI and HPC technology.

Join NVIDIA's team to make a lasting impact on the world of AI and high-performance computing, working with some of the most advanced technology in the industry.

Last updated 2 days ago

Responsibilities For Senior HPC and AI Networking Performance Research and Analysis Engineer

  • Profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training
  • Explore and research AI workloads and DL models for large-scale deep learning LLM training
  • Benchmark, profile, and analyze performance to find bottlenecks
  • Implement performance analysis tools
  • Collaborate with hardware and software teams
  • Define performance test planning and set performance expectations

Requirements For Senior HPC and AI Networking Performance Research and Analysis Engineer

Python
Linux
  • B.Sc in Computer Science or Software Engineering or equivalent experience
  • 5+ years of experience with high-performance Networking (RDMA, MPI, NCCL, Congestion Control Algorithms)
  • Demonstrated Performance Analysis skills and methodologies
  • Experience with NVIDIA GPUs, CUDA library, deep learning frameworks
  • Fast and self-learning capabilities with strong analytical skills
  • Programming Languages: Python, Bash and C languages
  • Experience with Linux OS distros
  • Great teammate with good communication and interpersonal skills

Benefits For Senior HPC and AI Networking Performance Research and Analysis Engineer

Medical Insurance
Equity
  • Competitive salaries
  • Comprehensive benefits package
  • Equity

Jobs Related To NVIDIA Senior HPC and AI Networking Performance Research and Analysis Engineer