The Platform ML team at OpenAI builds the ML side of their state-of-the-art internal training framework used to train cutting-edge models. They work on distributed model execution and the interfaces and implementation for model code, training, and inference. The team's priorities are to maximize training throughput and researcher throughput to accelerate progress towards AGI.
As a Platform ML Engineering Manager for Inference, you will:
- Lead critical work on the shared internal inference stack
- Improve and extend the inference stack for research use cases
- Get SOTA throughput for the most important research models
- Reduce time to get efficient inference for new model architectures
- Collaborate with Applied AI engineering to maximize benefits of the shared internal inference stack
- Hire world-class AI systems engineers in a competitive market
- Coordinate inference needs of OpenAI's research teams
- Create a diverse, equitable, and inclusive culture
The ideal candidate should have:
- 3+ years of experience in engineering management
- 7+ years as an IC working with high scale distributed systems and ML systems
- Experience with ML systems, particularly high scale distributed training or inference for modern LLMs
- Familiarity with the latest AI research and working knowledge of efficient implementation
- A track record of building inclusive teams and caring about diversity, equity, and inclusion
OpenAI offers a competitive compensation package, including equity, and is committed to creating a diverse and inclusive work environment.