Taro Logo

Software Engineer, Inference - TL

AI research and deployment company dedicated to ensuring general-purpose artificial intelligence benefits humanity
$460,000 - $685,000
Backend
Staff Software Engineer
In-Person
1,000 - 5,000 Employees
8+ years of experience
AI
This job posting is no longer active. Check out these related jobs instead:

Job Description

OpenAI is seeking a Technical Lead Software Engineer for their Inference team, focusing on high-performance model inference and infrastructure scaling. This role combines technical leadership with hands-on engineering, requiring expertise in CUDA optimization and distributed systems. The position offers a competitive compensation package of $460K-$685K plus equity and comprehensive benefits.

The role involves leading the design and implementation of core inference infrastructure for frontier AI models, optimizing CUDA-based systems, and collaborating with researchers to build scalable inference pipelines. The ideal candidate will have deep expertise in CUDA kernel optimization, experience with PyTorch and NVIDIA's GPU stack, and a strong background in large-scale ML infrastructure.

Working at OpenAI means joining a team dedicated to ensuring AI benefits humanity through their products. The company emphasizes safety and human needs at its core, offering opportunities to work on cutting-edge AI technology. The position includes comprehensive benefits such as medical insurance, mental health support, generous parental leave, and learning stipends.

The role requires both technical depth and leadership skills, as you'll be mentoring engineers and driving technical direction across teams. You'll work in San Francisco, collaborating with researchers, infrastructure teams, and product teams to deliver state-of-the-art AI models efficiently and reliably. This is an opportunity to shape the future of AI technology while working with some of the most advanced models and infrastructure in the field.

Last updated 3 months ago

Responsibilities For Software Engineer, Inference - TL

  • Lead the design and implementation of core inference infrastructure for serving frontier AI models in production
  • Own and optimize CUDA-based systems and kernels to maximize performance across our fleet
  • Partner with researchers to integrate novel model architectures into performant, scalable inference pipelines
  • Build tooling and observability to detect bottlenecks, guide system tuning, and ensure stable deployment at scale
  • Collaborate cross-functionally to align technical direction across research, infra, and product teams
  • Mentor engineers on GPU performance, CUDA development, and distributed inference best practices

Requirements For Software Engineer, Inference - TL

Python
  • Deep expertise in CUDA, including writing and optimizing high-performance kernels
  • Experience leading complex engineering efforts in ML infrastructure
  • Understanding of the full inference stack
  • Experience working in large, distributed GPU environments
  • Strong familiarity with PyTorch and NVIDIA's GPU software stack
  • Systems-level understanding and ability to work with low-level code

Benefits For Software Engineer, Inference - TL

Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
401k
Parental Leave
Equity
  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • 401(k) plan with 50% matching
  • Generous time off and company holidays
  • 24 weeks paid birth-parent leave & 20-week paid parental leave
  • Annual learning & development stipend ($1,500 per year)
  • Equity