Taro Logo

Senior System Software Architect, HPC Networking

NVIDIA is the world leader in accelerated computing, pioneering accelerated computing to tackle challenges no one else can solve.
Backend
Senior Software Engineer
Hybrid
5+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Senior System Software Architect, HPC Networking

NVIDIA is seeking a highly motivated Senior System Software Architect for HPC Networking to join their team of experts in shaping the future of high-performance and ML/AI computing. The role involves working on next-generation Ethernet, InfiniBand, and NVLink systems that power advanced compute clusters and supercomputers.

Key responsibilities include:

  • Creating proofs-of-concept for AI Frameworks, runtime designs, and network hardware features
  • Researching and implementing features for AI and HPC communication middleware and Deep Learning frameworks
  • Designing and developing hardware features for scientific, Deep Learning, and data-intensive workloads
  • Collaborating with customers to understand needs and provide innovative solutions

Requirements:

  • Ph.D., Masters, or Bachelors in computer science, computer engineering, electrical engineering, or related field
  • 5+ years of experience in DNNs, Scaling of DNNs, Parallelism of DNN frameworks, or deep learning training workloads
  • Deep understanding of parallelism techniques
  • Experience with AI network parallelism using collective libraries and RDMA/RoCE
  • Strong programming and software development skills

Preferred qualifications:

  • Deep understanding of technology and passion for the field
  • Strong collaborative and interpersonal skills
  • Background in designing communication middleware for high-performance computing systems
  • Experience with CUDA programming and NVIDIA GPUs

NVIDIA offers a diverse work environment and is an equal opportunity employer. The position is hybrid, allowing for flexibility in work arrangements. Join NVIDIA to work on innovative technology driving the world forward in AI, HPC, and accelerated computing.

Last updated 8 months ago

Responsibilities For Senior System Software Architect, HPC Networking

  • Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), new runtime designs, and new network hardware features
  • Research, design and implement features for AI and HPC communication middleware (NCCL, UCX, UCC), and Deep Learning frameworks such as TensorFlow/Pytorch
  • Research, design and develop hardware features relevant to scientific, Deep learning, and data-intensive workloads
  • Collaborate with customers to understand their needs and provide innovative solutions for them

Requirements For Senior System Software Architect, HPC Networking

Python
  • Ph.D., Masters, or Bachelors in computer science, computer engineering, electrical engineering or a closely related field
  • 5+ years of experience in DNNs, Scaling of DNNs, Parallelism of DNN frameworks, or deep learning training workloads
  • Deep understanding of parallelism techniques including Data Parallelism, Pipeline Parallelism, Tensor Parallelism, and FSDP
  • Experience with AI network parallelism using collective libraries and RDMA/RoCE
  • Background in algorithm design, system programming, and computer architecture
  • Strong programming and software development skills
  • Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment

Benefits For Senior System Software Architect, HPC Networking

  • Opportunity to work on innovative technology
  • Diverse work environment
  • Equal opportunity employer

Interested in this job?