Taro Logo

Distinguished Software Architect - Deep Learning and HPC Communications

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and transforming industries through AI and digital twins.
$308,000 - $471,500
Principal Software Engineer
In-Person
5,000+ Employees
15+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, the pioneer in GPU technology and leader in accelerated computing, is seeking a Distinguished Software Architect to join their GPU Communications Libraries and Networking team. This role represents a unique opportunity to shape the future of deep learning and high-performance computing communications infrastructure.

The position involves working on cutting-edge communication libraries like NCCL, NVSHMEM, and UCX that power applications running on massive GPU clusters. You'll be at the forefront of designing next-generation data center platforms that push the boundaries of what's possible in AI and HPC. The role requires deep expertise in GPU architecture, high-performance networking, and distributed systems.

As a Distinguished Software Architect, you'll be responsible for researching and implementing new communication technologies, co-designing hardware and software solutions with various architectural teams, and ensuring seamless integration across the technology stack. The impact of your work will be direct and significant, as communication performance between GPUs directly affects application performance at scale.

The ideal candidate brings 15+ years of experience, with demonstrated leadership in HPC/DL communications evidenced through patents, publications, and conference presentations. You should be an expert in parallel programming models, GPU architecture, and high-performance networking technologies. Strong programming skills in C/C++ and experience with deep learning frameworks are essential.

This role offers the opportunity to work at the intersection of hardware and software, pushing the boundaries of communication technology in one of the world's most innovative companies. You'll collaborate with leading researchers and engineers while helping to shape the future of AI and HPC infrastructure. The position includes competitive compensation, equity, and the chance to work on technologies that are transforming multiple industries.

Last updated a month ago

Responsibilities For Distinguished Software Architect - Deep Learning and HPC Communications

  • Research new communication technologies and design features for communication libraries
  • Propose innovative solutions in HW and SW for next-gen platforms
  • Inspire changes based on quantitative data and technical analysis
  • Drive adoption of new communication technologies
  • Collaborate with DL researchers and customers

Requirements For Distinguished Software Architect - Deep Learning and HPC Communications

  • PHD in Computer Science, Computer Engineering or related field or strong equivalent experience; 15+ years of relevant experience
  • Expert in HPC, parallel programming models (MPI, SHMEM), communication runtime systems
  • Deep understanding of high performance networking
  • Strong knowledge of ML/DL fundamentals, parallel algorithms, fault tolerance
  • Programming fluency with C or C++ for systems software development
  • Flexibility to work across different teams and timezones

Benefits For Distinguished Software Architect - Deep Learning and HPC Communications

Equity
  • Equity

Related Jobs

Distinguished Engineer – Data Center System Software Architect

Lead system software architecture for NVIDIA's data center systems, working with cutting-edge GPU technology and AI software stack. 20+ years experience required.

Distinguished Software Engineer - NVLink Fusion Software

Lead the development of NVIDIA's NVLink Fusion software architecture, enabling industry-leading AI scale-up and scale-out performance with NVIDIA technology and semi-custom ASICs/CPUs.

Distinguished Engineer - Rack System Software

Distinguished Engineer position at NVIDIA focusing on rack system software architecture, requiring 16+ years of experience in system architecture and design.

Distinguished Engineer, Apache Spark

Lead the architecture and implementation of GPU-accelerated Apache Spark at NVIDIA, working with distributed systems and open source communities to revolutionize data processing performance.

Principal Architect, AI Networking

Principal Architect position at NVIDIA focusing on AI Networking, developing cutting-edge solutions for large-scale AI infrastructure and networking systems.