Taro Logo

Software Engineer, Frontier Clusters Infrastructure

AI research and deployment company dedicated to ensuring general-purpose artificial intelligence benefits humanity
$380,000 - $460,000
Senior Software Engineer
Hybrid
1,000 - 5,000 Employees
5+ years of experience
AI

Description For Software Engineer, Frontier Clusters Infrastructure

OpenAI is seeking a Senior Software Engineer to join their Frontier Clusters Infrastructure team, which builds and manages some of the world's largest supercomputers for cutting-edge AI model training. This role combines distributed systems engineering with high-performance computing, focusing on designing and operating large-scale compute clusters that power advanced AI research. Based in San Francisco with a hybrid work model, the position offers a competitive salary range of $380K-$460K plus equity and comprehensive benefits.

The ideal candidate will work on developing software for orchestrating massive compute clusters, managing resource allocation, and automating cluster lifecycle operations. They should have strong expertise in distributed systems, experience with cloud platforms (particularly Azure), and proficiency in languages like Python and Go. The role requires someone who can interface effectively with researchers and engineering teams while ensuring the reliability and security of cluster systems.

OpenAI offers an exceptional benefits package including comprehensive health coverage, mental wellness support, generous parental leave, and professional development opportunities. The company's mission-driven approach focuses on ensuring AI benefits humanity, making this an opportunity to work on transformative technology while maintaining a strong emphasis on safety and ethical considerations.

The position combines technical challenges of hyperscale infrastructure with the excitement of working at the forefront of AI development. You'll be part of a team that directly enables OpenAI's most cutting-edge model training, making this role crucial for anyone interested in both distributed systems and artificial intelligence advancement.

Last updated 4 hours ago

Responsibilities For Software Engineer, Frontier Clusters Infrastructure

  • Develop and optimize high-performance cluster systems to support compute-intensive AI workloads
  • Ensure the reliability, scalability, and efficiency of our cluster infrastructure
  • Interface with researchers and engineering teams to understand compute needs and optimize resource allocation
  • Implement and uphold security measures for all cluster systems

Requirements For Software Engineer, Frontier Clusters Infrastructure

Python
Go
Kubernetes
  • Strong understanding of distributed systems principles with proven track record in designing scalable, reliable, and secure compute clusters
  • Strong programming skills, with experience in Python, Go, or similar languages
  • Experience working in public cloud environments (especially Azure)
  • Familiar with high-performance computing, GPU workloads, or AI/ML compute patterns
  • Ability to take initiative and work in a fast-paced, dynamic environment

Benefits For Software Engineer, Frontier Clusters Infrastructure

Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
401k
Parental Leave
Education Budget
Equity
Relocation Benefits
  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • 401(k) plan with 50% matching
  • Generous time off and company holidays
  • Paid parental leave (24 weeks paid birth-parent leave & 20-week paid parental leave)
  • Annual learning & development stipend ($1,500 per year)
  • Equity
  • Relocation assistance

Interested in this job?

Jobs Related To OpenAI Software Engineer, Frontier Clusters Infrastructure

Senior System Software Engineer, NCCL - Partner Enablement

Senior System Software Engineer role at NVIDIA focusing on NCCL partner enablement and GPU communications libraries for AI and HPC applications.

Senior Software Engineer, Systems Infrastructure

Senior Software Engineer role at LinkedIn building next-gen distributed systems infrastructure and platforms that power LinkedIn's core services at massive scale.

Senior Software Engineer, Compute

Senior Software Engineer position at Aurora, focusing on compute workflows and distributed systems for self-driving technology.

Software Engineer with Systems Depth

Senior Software Engineer role at Datadog focusing on systems infrastructure, offering $130K-$300K salary plus benefits, based in Denver.

Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

Senior Software Engineering role focusing on DGX Cloud infrastructure automation and distributed systems at NVIDIA.