Taro Logo

Senior System Software Engineer, NCCL - Partner Enablement

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and AI solutions.
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, the pioneer in GPU technology and accelerated computing, is seeking a Senior System Software Engineer for their NCCL Partner Enablement team. This role sits at the intersection of high-performance computing and artificial intelligence, working with the team that developed NCCL, NVSHMEM & GPUDirect. The position involves optimizing communication libraries crucial for scaling Deep Learning and HPC applications across large clusters with high-speed networking.

The role offers an exceptional opportunity to work with cutting-edge technology in AI and HPC, engaging directly with key partners and customers to solve complex technical challenges. You'll be responsible for performance analysis, tool development, and providing expert guidance on multi-node cluster implementations. The ideal candidate brings strong expertise in C/C++ programming, parallel computing, and high-performance networking, along with experience in cloud platforms and container technologies.

At NVIDIA, you'll be part of a team that's transforming industries through AI and high-performance computing innovations. The company offers competitive compensation, comprehensive benefits, and a work environment that champions diversity and inclusion. This position provides unique exposure to both the technical depths of GPU computing and the collaborative aspects of partner enablement, making it an ideal opportunity for someone passionate about advanced computing technologies and their practical applications.

Last updated 2 days ago

Responsibilities For Senior System Software Engineer, NCCL - Partner Enablement

  • Engage with partners and customers to root cause functional and performance issues reported with NCCL
  • Conduct performance characterization and analysis of NCCL and DL applications on GPU clusters
  • Develop tools and automation to isolate issues on new systems and platforms
  • Guide customers and support teams on HPC knowledge
  • Document and conduct trainings/webinars for NCCL
  • Engage with internal teams on networking, GPUs, storage, infrastructure and support

Requirements For Senior System Software Engineer, NCCL - Partner Enablement

Python
Linux
Kubernetes
  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience
  • Experience with parallel programming and communication runtime
  • Excellent C/C++ programming skills
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high performance networking
  • Expert in Linux fundamentals and Python
  • Familiar with containers, cloud provisioning and scheduling tools
  • Flexibility to work across different teams and timezones