Taro Logo

Senior System Software Engineer, NCCL - Partner Enablement

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, the pioneer in accelerated computing and GPU technology, is seeking a Senior System Software Engineer to join their NCCL team. This role focuses on partner enablement for NVIDIA's crucial GPU communication libraries that are essential for scaling Deep Learning and HPC applications. The position offers a unique opportunity to work with cutting-edge technology in AI networking stack, collaborating with the team that developed NCCL, NVSHMEM & GPUDirect.

The role involves deep engagement with partners and customers, conducting performance analysis on GPU clusters, and developing tools for issue isolation across various cloud platforms. You'll be working at the intersection of high-performance computing and artificial intelligence, helping to optimize communication libraries that power some of the world's most advanced AI and HPC applications.

As a Senior System Software Engineer, you'll need strong expertise in C/C++ programming, parallel computing, and high-performance networking protocols. The ideal candidate will have extensive experience with Linux systems, containerization, and cloud technologies. The position requires both technical depth in system software and the ability to work effectively with partners and customers across different time zones.

NVIDIA offers a competitive compensation package and the opportunity to work on groundbreaking technology that's transforming industries through AI and high-performance computing. The company promotes a diverse, inclusive work environment and provides extensive benefits. This role is perfect for someone passionate about system software, high-performance computing, and who wants to contribute to the future of AI and GPU computing.

Last updated 2 months ago

Responsibilities For Senior System Software Engineer, NCCL - Partner Enablement

  • Engage with partners and customers to root cause functional and performance issues reported with NCCL
  • Conduct performance characterization and analysis of NCCL and DL applications on GPU clusters
  • Develop tools and automation to isolate issues on new systems and platforms
  • Guide customers and support teams on HPC knowledge
  • Document and conduct trainings/webinars for NCCL
  • Engage with internal teams on networking, GPUs, storage, infrastructure and support

Requirements For Senior System Software Engineer, NCCL - Partner Enablement

Linux
Python
  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience
  • Experience with parallel programming and communication runtime
  • Excellent C/C++ programming skills
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high performance networking
  • Expert in Linux fundamentals and Python
  • Familiar with containers, cloud provisioning and scheduling tools
  • Flexibility to work across different teams and timezones

Related Jobs

Senior Software Engineer, Fabric Networking - GPU

Senior Software Engineer position at NVIDIA focusing on GPU Fabric Networking, developing high-performance GPU-to-GPU communication systems and next-generation networking solutions.

Senior Advanced Development Engineer, GPU Networking

Senior Advanced Development Engineer position at NVIDIA focusing on GPU Networking Architecture, leading AI infrastructure solutions and POC development.

Senior Software Engineer

Senior Software Engineer position at NVIDIA focusing on developing scalable software systems for Data Center environments, requiring 5+ years of experience in distributed systems and microservices architecture.

Senior Software Engineer, Network Management

Senior Software Engineer position at NVIDIA focusing on developing and optimizing network management applications for InfiniBand and NVLink networks using C++ and Python.

Senior Software Engineer

Senior Software Engineer position at NVIDIA focusing on data center software systems development, requiring 5+ years of experience in distributed systems and microservices architecture.