AI/HPC Network Engineer

Meta builds technologies that help people connect, find communities, and grow businesses, including Facebook, Messenger, Instagram, WhatsApp, and AR/VR technologies.
$147,000 - $208,000
Backend
Senior Software Engineer
In-Person
5,000+ Employees
4+ years of experience
AI · Enterprise SaaS

Description For AI/HPC Network Engineer

Meta is seeking an AI/HPC Network Engineer to join their AI Infrastructure team, focusing on scaling and evolving network infrastructure that connects GPU systems for AI training and inference. This role sits at the intersection of high-performance computing and AI infrastructure, requiring expertise in both networking and distributed systems.

The position involves designing and operating large-scale networking systems that support Meta's exponentially growing AI training infrastructure. You'll be working on cutting-edge challenges in network fabric, host networking, communications libraries, and scheduling infrastructure, all while ensuring the network meets stringent performance and availability requirements for RDMA workloads.

As an AI/HPC Network Engineer, you'll be responsible for researching and implementing various network topologies, developing automation tools, and working closely with hardware and software teams to influence the future of AI networking infrastructure. The role requires a strong background in datacenter networks, programming skills in languages like Python and Go, and experience with network automation.

The ideal candidate will have 4+ years of experience with large-scale training workloads, deep understanding of AI training workload demands, and expertise in IB/RDMA/RoCE Networks. You'll be part of Meta's mission to build the next evolution in social technology, working on infrastructure that powers AI applications across Meta's family of apps and future technologies.

Meta offers a competitive compensation package including a base salary range of $147,000-$208,000, plus bonus, equity, and comprehensive benefits. This is an opportunity to work at the forefront of AI infrastructure, solving complex technical challenges while contributing to systems that operate at unprecedented scale.

Last updated 15 hours ago

Responsibilities For AI/HPC Network Engineer

  • Design, develop, test and operate networking systems to support large scale AI training jobs
  • Research, develop and deploy technologies and network topologies to evolve and scale AI networks
  • Work closely with hardware, software and sourcing teams to develop new networking solutions
  • Define and develop optimized network automation tools and systems
  • Be oncall to learn from real world production challenges

Requirements For AI/HPC Network Engineer

Python
Go
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Experience in designing, deploying and operating datacenter networks at scale
  • Experience coding in languages like Python, C++, Go
  • Experience in network automation software leveraging software defined networking principles
  • 4+ years of experience working on networks supporting large scale training workloads (preferred)
  • Understanding of AI training workloads and demands they exert on networks (preferred)
  • Experience with IB/RDMA/RoCE Networks (preferred)
  • Understanding of RDMA congestion control mechanisms on IB and RoCE Networks (preferred)

Benefits For AI/HPC Network Engineer

Medical Insurance
Equity
  • Competitive salary
  • Bonus
  • Equity
  • Benefits package

Interested in this job?

Jobs Related To Meta AI/HPC Network Engineer

Game Developer - Beat Games

Senior Game Developer position at Beat Games (Meta) working on Beat Saber VR game development in Prague

QA Engineering Lead

Senior QA Engineering Lead position at Meta, focusing on quality assurance for core products like Facebook and Instagram, requiring 3+ years of experience and strong technical background.

Network Production Engineer - Backbone

Senior Network Production Engineer role at Meta, combining software development and network engineering to maintain and improve global backbone network infrastructure.

Software Engineer, Audio SWE

Senior Audio Software Engineer role at Meta, focusing on audio processing, codecs, and real-time communication technologies for AR/VR and social platforms.

Production Systems Engineer, AI Systems

Senior Production Systems Engineer role at Meta focusing on AI systems infrastructure, hardware lifecycle management, and network technologies optimization.