Meta is seeking an AI/HPC Network Engineer to join their AI Infrastructure team, focusing on scaling and evolving network infrastructure that connects GPU systems for AI training and inference. This role sits at the intersection of high-performance computing and AI infrastructure, requiring expertise in both networking and distributed systems.
The position involves designing and operating large-scale networking systems that support Meta's exponentially growing AI training infrastructure. You'll be working on cutting-edge challenges in network fabric, host networking, communications libraries, and scheduling infrastructure, all while ensuring the network meets stringent performance and availability requirements for RDMA workloads.
As an AI/HPC Network Engineer, you'll be responsible for researching and implementing various network topologies, developing automation tools, and working closely with hardware and software teams to influence the future of AI networking infrastructure. The role requires a strong background in datacenter networks, programming skills in languages like Python and Go, and experience with network automation.
The ideal candidate will have 4+ years of experience with large-scale training workloads, deep understanding of AI training workload demands, and expertise in IB/RDMA/RoCE Networks. You'll be part of Meta's mission to build the next evolution in social technology, working on infrastructure that powers AI applications across Meta's family of apps and future technologies.
Meta offers a competitive compensation package including a base salary range of $147,000-$208,000, plus bonus, equity, and comprehensive benefits. This is an opportunity to work at the forefront of AI infrastructure, solving complex technical challenges while contributing to systems that operate at unprecedented scale.