Taro Logo

Distributed Training Engineer, Sora

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
$295,000 - $440,000
Machine Learning
Staff Software Engineer
Hybrid
1,000 - 5,000 Employees
7+ years of experience
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Distributed Training Engineer, Sora

The Sora team at OpenAI is working on making video a key capability of OpenAI's foundation models. As a Distributed Training Engineer for Sora, you will work on improving the training throughput for our internal training framework and enable researchers to experiment with new ideas. This role requires strong engineering skills, the ability to write bug-free machine learning code, and deep knowledge of supercomputer performance.

Key responsibilities include:

  • Collaborating with researchers to develop systems-efficient video models and architectures
  • Applying the latest techniques to achieve impressive hardware efficiency for training runs
  • Profiling and optimizing the training framework

The ideal candidate should have experience with multi-modal ML pipelines, strong software engineering skills (particularly in Python), experience with understanding and optimizing training kernels, and a passion for understanding stable training dynamics.

OpenAI offers a competitive compensation package, including a salary range of $295K – $440K, generous equity, and comprehensive benefits such as medical insurance, mental health support, 401(k) matching, unlimited time off, and paid parental leave.

This role is based in San Francisco, CA, with a hybrid work model of 3 days in the office per week. OpenAI is committed to diversity, equality, and creating an inclusive environment for all employees.

Last updated 10 months ago

Responsibilities For Distributed Training Engineer, Sora

  • Collaborate with researchers to enable them to develop systems-efficient video models and architectures
  • Apply the latest techniques to our internal training framework to achieve impressive hardware efficiency for our training runs
  • Profile and optimize our training framework

Requirements For Distributed Training Engineer, Sora

Python
  • Experience working with multi-modal ML pipelines
  • Strong software engineering skills and proficiency in Python
  • Experience understanding and optimizing training kernels
  • Passion for understanding stable training dynamics
  • Ability to dive deep into systems implementations to improve performance and maintainability

Benefits For Distributed Training Engineer, Sora

Equity
Medical Insurance
Dental Insurance
Vision Insurance
401k
Education Budget
Parental Leave
Mental Health Assistance
  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • 401(k) plan with 50% matching
  • Unlimited time off and 13 company holidays per year
  • Paid parental leave (20 weeks) and family-planning support
  • Annual learning & development stipend ($1,500 per year)
  • Equity

Interested in this job?