Taro Logo

Senior HPC Performance Engineer

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and AI solutions.
$148,000 - $287,500
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, the pioneer in GPU technology and accelerated computing, is seeking a Senior HPC Performance Engineer to join their GPU Communications Libraries and Networking team. This role sits at the intersection of high-performance computing and artificial intelligence, working on critical communication libraries like NCCL, NVSHMEM, and UCX that power massive-scale GPU deployments.

The position offers a unique opportunity to influence the roadmap of communication libraries that are essential for modern deep learning and HPC applications running on tens of thousands of GPUs. You'll be working with cutting-edge technology, including high-speed interconnects like NVLink and PCIe, and networking solutions such as Infiniband and Ethernet.

As a Senior HPC Performance Engineer, you'll be responsible for conducting detailed performance analysis on large-scale GPU clusters, optimizing communication between GPUs, and ensuring optimal performance at scale. The role requires a deep understanding of computer architecture, parallel programming, and system performance optimization.

The ideal candidate brings strong technical expertise in HPC and performance engineering, with at least 3 years of experience in parallel programming and communication runtimes. You should be comfortable with both low-level programming (C/C++) and modern cloud technologies (containers, Kubernetes, SLURM). Experience with CUDA programming, GPUs, and deep learning frameworks would be particularly valuable.

NVIDIA offers competitive compensation, with a base salary ranging from $148,000 to $287,500 USD depending on level and experience, plus equity and comprehensive benefits. You'll be joining a dynamic, global team that's pushing the boundaries of what's possible in AI and HPC, making this an excellent opportunity for someone passionate about high-performance computing and system optimization.

Last updated 4 days ago

Responsibilities For Senior HPC Performance Engineer

  • Conduct in-depth performance characterization and analysis on large multi-GPU and multi-node clusters
  • Study the interaction of libraries with hardware and software components
  • Evaluate proof-of-concepts and conduct trade-off analysis
  • Triage and root-cause performance issues reported by customers
  • Collect performance data and build tools for visualization and analysis
  • Collaborate with team across multiple time zones

Requirements For Senior HPC Performance Engineer

Python
Linux
Kubernetes
  • M.S. or PHD in Computer Science or related field
  • 3+ years experience with parallel programming and communication runtime
  • Experience with performance benchmarking on large scale HPC clusters
  • Understanding of computer system architecture and operating systems
  • Ability to implement micro-benchmarks in C/C++
  • Proficiency in Python
  • Experience with containers, cloud provisioning and scheduling tools
  • Strong communication skills and ability to work across different teams

Benefits For Senior HPC Performance Engineer

Equity
  • Equity

Related Jobs