Taro Logo

Senior System Software Engineer, NCCL - Partner Enablement

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and AI solutions.
$148,000 - $287,500
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, the pioneer in GPU technology and accelerated computing, is seeking a Senior System Software Engineer for their GPU Communications Libraries and Networking team. This role focuses on NCCL and NVSHMEM communication runtimes for Deep Learning and HPC applications. The position offers an exceptional opportunity to work with cutting-edge AI networking stack and large-scale GPU clusters.

The role involves close collaboration with partners and customers to optimize performance and resolve technical issues in NCCL implementations. You'll be conducting sophisticated performance analysis on GPU clusters, developing automation tools, and providing expert guidance on HPC methodologies. The position requires strong expertise in parallel programming, C/C++, and high-performance networking technologies like Infiniband/RoCE/Ethernet.

As a senior engineer, you'll work across multiple time zones with various teams, contributing to NVIDIA's groundbreaking developments in AI and High Performance Computing. The company offers competitive compensation including base salary range of $148,000 - $287,500 (depending on level), equity, and comprehensive benefits.

This role is perfect for someone passionate about high-performance computing, with strong technical skills and the ability to work effectively with partners and customers. You'll be at the forefront of AI and HPC technology, helping to shape the future of GPU communications and networking capabilities at NVIDIA.

Last updated 6 hours ago

Responsibilities For Senior System Software Engineer, NCCL - Partner Enablement

  • Engage with partners and customers to root cause functional and performance issues reported with NCCL
  • Conduct performance characterization and analysis of NCCL and DL applications on GPU clusters
  • Develop tools and automation to isolate issues on new systems and platforms
  • Guide customers and support teams on HPC knowledge
  • Document and conduct trainings/webinars for NCCL
  • Engage with internal teams on networking, GPUs, storage, infrastructure and support

Requirements For Senior System Software Engineer, NCCL - Partner Enablement

Python
Linux
Kubernetes
  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience
  • Experience with parallel programming and communication runtime
  • Excellent C/C++ programming skills
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high performance networking
  • Expert in Linux fundamentals and Python scripting
  • Familiar with containers, cloud provisioning and scheduling tools
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and timezones

Benefits For Senior System Software Engineer, NCCL - Partner Enablement

Equity
Medical Insurance
  • Equity
  • Medical Insurance