Taro Logo

Senior System Software Architect, HPC and AI Networking

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.
Principal Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior System Software Architect, HPC and AI Networking

NVIDIA, the world leader in accelerated computing, is seeking a Senior System Software Architect specializing in HPC and AI Networking. This role sits at the intersection of high-performance computing and artificial intelligence, focusing on building next-generation software and hardware systems for the most demanding AI workloads.

The position involves designing and implementing scalable software systems that optimize distributed AI training and inference, with a particular focus on throughput, latency, and memory efficiency. You'll work with cutting-edge communication libraries like NCCL, UCX, and UCC, collaborating with major AI framework teams including TensorFlow, PyTorch, and JAX to enhance performance and reliability.

As a Senior System Software Architect, you'll be instrumental in co-designing hardware features for GPUs, DPUs, and interconnects, directly contributing to the evolution of runtime systems and AI-specific protocol layers. The role requires deep expertise in DNNs, scaling, and parallelism, combined with strong programming skills and system architecture knowledge.

NVIDIA offers a dynamic, innovative environment where you'll work with world-class researchers and engineers. The company's commitment to fostering diversity and inclusion, combined with its position at the forefront of AI and accelerated computing, makes this an exceptional opportunity for someone passionate about shaping the future of technology.

The role is based in Beijing, China, with a hybrid work arrangement, offering the flexibility of modern work practices while maintaining collaborative opportunities with global teams. This position represents a unique chance to impact the future of AI infrastructure at one of technology's most respected and innovative companies.

Last updated 3 days ago

Responsibilities For Senior System Software Architect, HPC and AI Networking

  • Design and prototype scalable software systems for distributed AI training and inference optimization
  • Develop and evaluate enhancements to communication libraries (NCCL, UCX, UCC)
  • Collaborate with AI framework teams (TensorFlow, PyTorch, JAX)
  • Co-design hardware features for GPUs, DPUs, and interconnects
  • Contribute to runtime systems, communication libraries, and AI-specific protocol layers
  • Collaborate with customers to understand needs and provide solutions

Requirements For Senior System Software Architect, HPC and AI Networking

Python
  • Ph.D, Masters, or Bachelors in computer science, computer engineering, electrical engineering or related field
  • 5+ years of experience in DNNs, Scaling of DNNs, Parallelism of DNN frameworks
  • Deep understanding of Inference and Training workloads and optimizations
  • Experience with AI network parallelism using collective libraries and RDMA/RoCE
  • Background in algorithm design, system programming, and computer architecture
  • Strong programming and software development skills
  • Ability to work and communicate effectively in a multi-national environment

Interested in this job?

Jobs Related To NVIDIA Senior System Software Architect, HPC and AI Networking

Distinguished Engineer – Data Center System Software Architect

Lead system software architecture for NVIDIA's data center platforms, working with cutting-edge GPU/CPU technology and major cloud providers. 20+ years experience required.

Senior Software Architect - GPU Fabric Networking

Senior Software Architect position at NVIDIA focusing on GPU Fabric Networking, offering $184K-$356.5K salary plus equity, requiring 10+ years of system architecture experience.

Distinguished Engineer – Data Center System Software Architect

Lead system software architecture for NVIDIA's data center systems, working with cutting-edge GPU technology and major cloud providers. 20+ years experience required.

Distinguished Systems Software Engineer, Graphics Delivery Network Platform

Distinguished Systems Software Engineer role at NVIDIA focusing on cloud streaming platform development, combining GPU expertise with AI technologies and distributed systems architecture.

Distinguished Software Architect - Deep Learning and HPC Communications

Distinguished Software Architect position at NVIDIA focusing on Deep Learning and HPC Communications, requiring 15+ years experience and expertise in parallel computing and high-performance networking.