The Platform Runtime team at OpenAI builds low-level framework components to power ML training systems. As an Engineering Manager for Distributed Systems, you'll manage teams responsible for delivering powerful APIs that orchestrate thousands of computers moving/persisting vast amounts of data. This role requires optimizing end-to-end systems, understanding high-performance I/O, and scaling experiences to new supercomputers while maintaining stability and performance. The team uses Python and Rust, and works in a fast-paced, iterative environment. You'll build teams to bring technology to millions of users worldwide, ensuring safety and reliability. The role is based in San Francisco, CA or remote in the United States (preferably West Coast), with a hybrid work model of 3 days in the office per week. Relocation assistance is offered.
Key responsibilities and qualifications:
OpenAI values diversity and is an equal opportunity employer, committed to providing reasonable accommodations to applicants with disabilities.