Production Systems Engineer, AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses, including Facebook, Messenger, Instagram, WhatsApp, and AR/VR experiences.
$132,000 - $191,000
Backend
Senior Software Engineer
In-Person
5,000+ Employees
4+ years of experience
AI · Infrastructure

Description For Production Systems Engineer, AI Systems

Meta is seeking an experienced Production Systems Engineer to join their Release to Production (RTP) team, focusing on AI/ML initiatives and large-scale AI Training and Inference systems. This role sits at the intersection of hardware and software, working with Meta's server infrastructure that powers their innovative AI services.

The position involves managing the end-to-end Hardware Lifecycle of Meta's servers, including prototyping experimental hardware, conducting pre-production debugging, implementing system monitoring, and developing automated provisioning solutions. The role is crucial in supporting Meta's ambitious AI infrastructure scaling efforts.

As a Production Systems Engineer, you'll work closely with cross-functional teams including hardware designers, networking teams, system manufacturers, and data center operations to enable and optimize new systems for production deployment. The role requires deep expertise in network technologies, including NICs, Switches, Optics, and various protocols.

The ideal candidate should have strong technical skills in server architecture, Linux systems, and networking protocols, with particular emphasis on AI platform integration and scale-out network technologies. You'll be responsible for creating experimental frameworks, developing diagnostic tools, and implementing solutions for hardware health monitoring.

This is an excellent opportunity for someone passionate about large-scale infrastructure and AI systems, offering competitive compensation ($132,000-$191,000/year) plus bonus, equity, and comprehensive benefits. The role is based in Menlo Park, CA, and offers the chance to work on cutting-edge AI infrastructure at one of the world's leading technology companies.

Meta provides a collaborative environment where you'll work with industry experts and have the opportunity to influence the future of AI infrastructure. The company offers excellent career growth potential and the chance to work on technologies that impact billions of users worldwide.

Last updated an hour ago

Responsibilities For Production Systems Engineer, AI Systems

  • Support new AI platform introduction into Meta fleet by driving scale up and scale out interface integration
  • Create experiments and tooling to detect and diagnose hardware/firmware/software health issues
  • Develop understanding of AI workload traffic and incorporate as part of NPI
  • Contribute to enabling hacks for future technology explorations in AI space
  • Troubleshoot, diagnose and root cause system failures
  • Develop visibility through data visualization
  • Implement systemic solutions to hardware health issues
  • Drive continuous product quality improvement

Requirements For Production Systems Engineer, AI Systems

Linux
Python
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 4+ years of work experience in network ASIC/Platform development, network product deployment, or Interconnect Technologies
  • Knowledge of server architecture and components
  • Experience working with Linux
  • Knowledge of TCP/IP and experience using iperf
  • Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

Medical Insurance
Dental Insurance
Vision Insurance
Equity
401k
  • Medical Insurance
  • Dental Insurance
  • Vision Insurance
  • Equity
  • 401k

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

Game Developer - Beat Games

Senior Game Developer position at Beat Games (Meta) working on Beat Saber VR game development in Prague

QA Engineering Lead

Senior QA Engineering Lead position at Meta, focusing on quality assurance for core products like Facebook and Instagram, requiring 3+ years of experience and strong technical background.

Network Production Engineer - Backbone

Senior Network Production Engineer role at Meta, combining software development and network engineering to maintain and improve global backbone network infrastructure.

Software Engineer, Audio SWE

Senior Audio Software Engineer role at Meta, focusing on audio processing, codecs, and real-time communication technologies for AR/VR and social platforms.

Manufacturing Test Engineer

Senior Manufacturing Test Engineer role at Meta developing and implementing test modules for Open Compute hardware manufacturing.