Production Systems Engineer, Sustaining

Meta builds technologies that help people connect, find communities, and grow businesses, including Facebook, Messenger, Instagram, WhatsApp, and AR/VR technologies.
$132,000 - $191,000
DevOps
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Production Systems Engineer, Sustaining

Meta is seeking an experienced Production Systems Engineer to join their Release to Production (RTP) team, focusing on the hardware lifecycle of Meta's server infrastructure. This role combines hardware systems expertise with software engineering, working at the intersection of AI/HPC infrastructure and datacenter operations. The position involves collaborating with hardware designers, system manufacturers, and various internal teams to ensure the robust operation of Meta's server infrastructure.

The role requires deep technical knowledge in both hardware and software domains, particularly in AI/HPC systems. You'll be responsible for developing and implementing testing methodologies, troubleshooting complex system issues, and creating tools for hardware/firmware/software health monitoring. The position offers competitive compensation ($132,000-$191,000/year) plus bonus and equity, along with comprehensive benefits.

This is an excellent opportunity for experienced systems engineers who want to work on cutting-edge infrastructure at one of the world's leading technology companies. You'll be part of a team that's essential to maintaining and improving Meta's massive technical infrastructure, working with the latest in AI and HPC technologies. The role offers significant technical challenges and the chance to impact systems operating at global scale.

The ideal candidate will bring a strong background in hardware systems, production support experience, and software development skills. You'll need to be comfortable working with both hardware and software components, and have the ability to diagnose complex system issues. This role provides an opportunity to work on some of the most advanced infrastructure systems while contributing to Meta's mission of connecting people and building the future of social technology.

Last updated 8 hours ago

Responsibilities For Production Systems Engineer, Sustaining

  • Develop robust, industry leading practices for supporting AI/HPC infrastructure at scale
  • Interface with external vendors and internal teams to understand system architecture
  • Create experiments and tooling to detect and diagnose hardware/firmware/software health issues
  • Implement sustaining workflows across hardware and software stacks
  • Troubleshoot, diagnose and root cause system failures
  • Drive discussions with teams on test specification and methodologies

Requirements For Production Systems Engineer, Sustaining

Python
Linux
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 5+ years experience in hardware systems technologies or supporting production hardware at scale
  • Experience in deploying and productionizing AI/HPC systems at scale
  • Experience in software and hardware co-design for hyperscale systems
  • Experience in object oriented programming (e.g., Python, C/C++)
  • Engineering for different server and network datacenter systems

Benefits For Production Systems Engineer, Sustaining

Medical Insurance
Dental Insurance
Vision Insurance
Equity
  • Bonus
  • Equity
  • Medical Insurance
  • Dental Insurance
  • Vision Insurance

Interested in this job?

Jobs Related To Meta Production Systems Engineer, Sustaining

Production Systems Engineer, Hardware

Senior Production Systems Engineer role at Meta, focusing on hardware lifecycle management and infrastructure automation for Meta's global server fleet.

SiteOps Area Capacity Engineer

Senior-level SiteOps Area Capacity Engineer role at Meta, focusing on data center capacity planning and technical oversight with competitive compensation.

Network Operations Engineer

Senior Network Operations Engineer role at Meta, focusing on improving and automating network operations for one of the world's largest technology infrastructures.

Network Production Engineer, Datacenter Infrastructure

Senior Network Production Engineer role at Meta, combining networking expertise with software engineering to build and maintain massive-scale datacenter infrastructure.

Production Engineering

Senior Production Engineering role at Meta focusing on infrastructure, systems reliability, and scalability for Meta's core services and platforms.