Production Systems Engineer, AI Systems

Meta

Meta builds technologies that help people connect, find communities, and grow businesses, including Facebook, Messenger, Instagram, WhatsApp, and AR/VR experiences.

Menlo Park, CA, USA

$132,000 - $191,000

Backend

Senior Software Engineer

In-Person

5,000+ Employees

4+ years of experience

AI · Infrastructure

Description For Production Systems Engineer, AI Systems

Meta is seeking an experienced Production Systems Engineer to join their Release to Production (RTP) team, focusing on AI/ML initiatives and large-scale AI Training and Inference systems. This role sits at the intersection of hardware and software, working with Meta's server infrastructure that powers their innovative AI services.

The position involves managing the end-to-end Hardware Lifecycle of Meta's servers, including prototyping experimental hardware, conducting pre-production debugging, implementing system monitoring, and developing automated provisioning solutions. The role is crucial in supporting Meta's ambitious AI infrastructure scaling efforts.

As a Production Systems Engineer, you'll work closely with cross-functional teams including hardware designers, networking teams, system manufacturers, and data center operations to enable and optimize new systems for production deployment. The role requires deep expertise in network technologies, including NICs, Switches, Optics, and various protocols.

The ideal candidate should have strong technical skills in server architecture, Linux systems, and networking protocols, with particular emphasis on AI platform integration and scale-out network technologies. You'll be responsible for creating experimental frameworks, developing diagnostic tools, and implementing solutions for hardware health monitoring.

This is an excellent opportunity for someone passionate about large-scale infrastructure and AI systems, offering competitive compensation ($132,000-$191,000/year) plus bonus, equity, and comprehensive benefits. The role is based in Menlo Park, CA, and offers the chance to work on cutting-edge AI infrastructure at one of the world's leading technology companies.

Meta provides a collaborative environment where you'll work with industry experts and have the opportunity to influence the future of AI infrastructure. The company offers excellent career growth potential and the chance to work on technologies that impact billions of users worldwide.

Last updated an hour ago

Responsibilities For Production Systems Engineer, AI Systems

Support new AI platform introduction into Meta fleet by driving scale up and scale out interface integration
Create experiments and tooling to detect and diagnose hardware/firmware/software health issues
Develop understanding of AI workload traffic and incorporate as part of NPI
Contribute to enabling hacks for future technology explorations in AI space
Troubleshoot, diagnose and root cause system failures
Develop visibility through data visualization
Implement systemic solutions to hardware health issues
Drive continuous product quality improvement

Requirements For Production Systems Engineer, AI Systems

Linux

Python

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
4+ years of work experience in network ASIC/Platform development, network product deployment, or Interconnect Technologies
Knowledge of server architecture and components
Experience working with Linux
Knowledge of TCP/IP and experience using iperf
Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

Medical Insurance

Dental Insurance

Vision Insurance

Equity

401k

Medical Insurance
Dental Insurance
Vision Insurance
Equity
401k

Meta

Meta builds technologies that help people connect, find communities, and grow businesses, including Facebook, Messenger, Instagram, WhatsApp, and AR/VR experiences.

Menlo Park, CA, USA

$132,000 - $191,000

Backend

Senior Software Engineer

In-Person

5,000+ Employees

4+ years of experience

AI · Infrastructure

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

Game Developer - Beat Games

Meta

Senior Game Developer position at Beat Games (Meta) working on Beat Saber VR game development in Prague

QA Engineering Lead

Meta

Senior QA Engineering Lead position at Meta, focusing on quality assurance for core products like Facebook and Instagram, requiring 3+ years of experience and strong technical background.

Network Production Engineer - Backbone

Meta

Senior Network Production Engineer role at Meta, combining software development and network engineering to maintain and improve global backbone network infrastructure.

Software Engineer, Audio SWE

Meta

Senior Audio Software Engineer role at Meta, focusing on audio processing, codecs, and real-time communication technologies for AR/VR and social platforms.

Manufacturing Test Engineer

Meta

Senior Manufacturing Test Engineer role at Meta developing and implementing test modules for Open Compute hardware manufacturing.