Staff Software Engineer, Compute

fal

fal is a company focused on AI infrastructure and compute platforms for managing and orchestrating AI workloads.

San Francisco, CA, USA

$180,000 - $250,000

Backend

Staff Software Engineer

In-Person

11 - 50 Employees

8+ years of experience

Description For Staff Software Engineer, Compute

fal is seeking a Staff Software Engineer to join their Compute team in San Francisco. This role focuses on building and maintaining large-scale computation platforms for AI workloads. The position requires expertise in backend systems that handle workload orchestration, request routing, and resource management. The ideal candidate will have deep knowledge of cloud infrastructure and Linux systems.

The role involves working with cutting-edge technologies including Kubernetes, Python, and various infrastructure tools to manage GPU computing resources. You'll be responsible for developing the core platform that handles AI workload orchestration, GPU server capacity management, and maintaining the infrastructure layer using tools like Terraform and Ansible.

This is an excellent opportunity for an experienced engineer who wants to work on challenging problems in the AI infrastructure space. The company offers competitive compensation ($180K-$250K plus equity) and comprehensive benefits including health insurance and visa sponsorship. While the position is primarily in-person in San Francisco, remote work may be considered for exceptional candidates.

The ideal candidate will be a self-starter with strong communication skills and deep experience in distributed systems and infrastructure management. You'll have the opportunity to shape the future of fal's infrastructure and work on interesting technical challenges while having significant impact on the company's growth and success.

Last updated 4 hours ago

Responsibilities For Staff Software Engineer, Compute

Develop and maintain core Python platform for request routing, AI workload orchestration, and GPU server capacity management
Develop and maintain infrastructure layer using Terraform, Ansible, and provider APIs
Own K8s, FluxCD, Nomad, Prometheus, Thanos, Grafana, Loki, and distributed networking storage
Create vision and foundation for infrastructure future development

Requirements For Staff Software Engineer, Compute

Python

Kubernetes

Deep experience building distributed compute platforms, preferably with Python
Strong foundation in managing both cloud and bare metal infrastructure
Solid understanding of K8s and CI/CD on it
Excellent communication
Self-starter who executes quickly, takes ownership and constantly seeks improvement

Benefits For Staff Software Engineer, Compute

Equity

Medical Insurance

Dental Insurance

Vision Insurance

Visa Sponsorship

Relocation Benefits

Employee-friendly equity terms (early exercise, extended exercise)
Learning and growth opportunities
Visa sponsorship and relocation assistance to San Francisco
Health, dental, and vision insurance (US)
Regular team events and offsites

fal