Taro Logo

Software Engineer, Fleet Hardware Health

AI research and deployment company dedicated to ensuring general-purpose artificial intelligence benefits humanity
$325,000 - $590,000
Backend
Senior Software Engineer
In-Person
1,000 - 5,000 Employees
5+ years of experience
AI

Description For Software Engineer, Fleet Hardware Health

OpenAI is seeking a Software Engineer for their Fleet Hardware team to maintain and optimize their extensive computing infrastructure. This role is crucial for supporting OpenAI's cutting-edge research and products like ChatGPT by ensuring high availability and performance of their supercomputing systems. The position offers a competitive salary range of $325K-$590K plus equity and comprehensive benefits.

The ideal candidate will be responsible for building automation systems, monitoring server health, and managing large-scale server environments. They should have strong expertise in Python/Go programming, Linux systems, and data analysis using SQL. While prior hardware expertise isn't required, knowledge of HPC or distributed systems would be beneficial.

The Fleet team at OpenAI manages critical computing infrastructure spanning data centers, GPUs, and networking systems. This role offers unique opportunities to work with state-of-the-art technology and solve complex problems at scale. The team emphasizes autonomy, ownership, and deep problem-solving skills.

Benefits include comprehensive healthcare, mental health support, generous parental leave (24 weeks for birth parents), 401(k) matching, and annual learning stipend. OpenAI values diversity and maintains an inclusive work environment, making this an excellent opportunity for engineers passionate about advancing AI technology responsibly.

Last updated 2 days ago

Responsibilities For Software Engineer, Fleet Hardware Health

  • Build and maintain automation systems for provisioning and managing server fleets
  • Develop tools to monitor server health, performance, and lifecycle events
  • Collaborate with clusters, networking, and infrastructure teams
  • Partner with external operators to ensure a high level of quality
  • Identify and fix performance bottlenecks and inefficiencies
  • Continuously improve automation to reduce manual work

Requirements For Software Engineer, Fleet Hardware Health

Python
Go
Linux
  • Experience managing large-scale server environments
  • A balance of strengths in building and operationalizing
  • Proficiency in Python, Go, or similar languages
  • Strong Linux, networking, and server hardware knowledge
  • Comfort digging into noisy data with SQL, PromQL, and Pandas

Benefits For Software Engineer, Fleet Hardware Health

Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
401k
Parental Leave
Education Budget
  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • 401(k) plan with 50% matching
  • Generous time off and company holidays
  • 24 weeks paid birth-parent leave & 20-week paid parental leave
  • Annual learning & development stipend ($1,500 per year)
  • Equity compensation

Jobs Related To OpenAI Software Engineer, Fleet Hardware Health