Taro Logo

Senior Software Architect - Deep Learning and HPC Communications

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and innovations in AI and digital twins.
Staff Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA is seeking a Senior Software Architect to join their team that developed NCCL, NVSHMEM & GPUDirect. This role focuses on co-designing next-generation data center platforms and scalable communications software for Deep Learning and HPC applications. The position involves working with GPU communication libraries crucial for scaling applications across thousands of GPUs. The ideal candidate will work on advancing state-of-the-art performance barriers, designing new communication technologies, and implementing innovative solutions for next-generation platforms. The role requires expertise in parallel programming, system architecture, and network topology, with opportunities to work on groundbreaking developments in AI and HPC. NVIDIA offers a collaborative environment working with GPU, Networking, and SW architects, making this an exceptional opportunity to shape the future of high-performance computing and AI infrastructure. The position offers competitive compensation and benefits, with the flexibility to work remotely across several European locations.

Last updated 2 days ago

Responsibilities For Senior Software Architect - Deep Learning and HPC Communications

  • Investigate opportunities to improve communication performance by identifying bottlenecks
  • Design and implement new communication technologies to accelerate AI and HPC workloads
  • Explore innovative solutions in HW and SW for next generation platforms
  • Build proofs-of-concept, conduct experiments, and perform quantitive modeling
  • Use simulation to explore performance of large GPU clusters

Requirements For Senior Software Architect - Deep Learning and HPC Communications

Linux
Python
  • M.S./Ph.D. degree in CS/CE or equivalent experience
  • 5+ years of relevant experience
  • Excellent C/C++ programming and debugging skills
  • Experience with parallel programming models (MPI, SHMEM)
  • Deep understanding of operating systems, computer and system architecture
  • Solid fundamentals of network architecture, topology, algorithms
  • Strong experience with Linux
  • Ability to work and communicate effectively in a multi-national environment