AI/HPC Network Engineer

Meta

Meta builds technologies that help people connect, find communities, and grow businesses, including Facebook, Messenger, Instagram, WhatsApp, and AR/VR technologies.

Menlo Park, CA, USA

$147,000 - $208,000

Backend

Senior Software Engineer

In-Person

5,000+ Employees

4+ years of experience

AI · Enterprise SaaS

Description For AI/HPC Network Engineer

Meta is seeking an AI/HPC Network Engineer to join their AI Infrastructure team, focusing on scaling and evolving network infrastructure that connects GPU systems for AI training and inference. This role sits at the intersection of high-performance computing and AI infrastructure, requiring expertise in both networking and distributed systems.

The position involves designing and operating large-scale networking systems that support Meta's exponentially growing AI training infrastructure. You'll be working on cutting-edge challenges in network fabric, host networking, communications libraries, and scheduling infrastructure, all while ensuring the network meets stringent performance and availability requirements for RDMA workloads.

As an AI/HPC Network Engineer, you'll be responsible for researching and implementing various network topologies, developing automation tools, and working closely with hardware and software teams to influence the future of AI networking infrastructure. The role requires a strong background in datacenter networks, programming skills in languages like Python and Go, and experience with network automation.

The ideal candidate will have 4+ years of experience with large-scale training workloads, deep understanding of AI training workload demands, and expertise in IB/RDMA/RoCE Networks. You'll be part of Meta's mission to build the next evolution in social technology, working on infrastructure that powers AI applications across Meta's family of apps and future technologies.

Meta offers a competitive compensation package including a base salary range of $147,000-$208,000, plus bonus, equity, and comprehensive benefits. This is an opportunity to work at the forefront of AI infrastructure, solving complex technical challenges while contributing to systems that operate at unprecedented scale.

Last updated 2 months ago

Responsibilities For AI/HPC Network Engineer

Design, develop, test and operate networking systems to support large scale AI training jobs
Research, develop and deploy technologies and network topologies to evolve and scale AI networks
Work closely with hardware, software and sourcing teams to develop new networking solutions
Define and develop optimized network automation tools and systems
Be oncall to learn from real world production challenges

Requirements For AI/HPC Network Engineer

Python

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
Experience in designing, deploying and operating datacenter networks at scale
Experience coding in languages like Python, C++, Go
Experience in network automation software leveraging software defined networking principles
4+ years of experience working on networks supporting large scale training workloads (preferred)
Understanding of AI training workloads and demands they exert on networks (preferred)
Experience with IB/RDMA/RoCE Networks (preferred)
Understanding of RDMA congestion control mechanisms on IB and RoCE Networks (preferred)

Benefits For AI/HPC Network Engineer

Medical Insurance

Equity

Competitive salary
Bonus
Equity
Benefits package

AI/HPC Network Engineer

Meta

Description For AI/HPC Network Engineer

Responsibilities For AI/HPC Network Engineer

Requirements For AI/HPC Network Engineer

Benefits For AI/HPC Network Engineer

Meta

Jobs Related To Meta AI/HPC Network Engineer