Site Reliability Engineer

HappyRobot

AI platform that builds and deploys AI workers to automate communication in the logistics industry.

San Francisco, CA, USA

$120,000 - $220,000

Site Reliability

Senior Software Engineer

In-Person

11 - 50 Employees

1+ year of experience

AI · Logistics

Job Description

HappyRobot is revolutionizing the logistics industry with their AI communication platform. As a Site Reliability Engineer, you'll play a crucial role in scaling operational resilience and maintaining system stability.

The company has successfully raised a Series A round from a16z and YC, demonstrating strong market validation and growth potential. Their AI workers automate various communication channels including phone calls, emails, and messages, primarily serving freight brokers, 3PLs, freight forwarders, and other supply chain enterprises.

In this role, you'll:

Lead the charge on operational resilience and system stability
Own debugging workflows and incident response
Design and implement tools to improve system observability
Help transition from reactive to proactive operations
Build and maintain internal tooling for reliability
Work with cutting-edge AI technology in a fast-paced environment

The ideal candidate should have hands-on experience with production systems, strong problem-solving skills, and proficiency in Python and Go. You'll be working with modern observability tools like Datadog, Prometheus, and Sentry.

This is a high-impact opportunity to shape reliability practices at a growing AI startup. You'll work alongside a world-class team of engineers and have significant autonomy in your role. The company offers competitive compensation including equity, and the chance to work on innovative AI technology that's transforming the logistics industry.

If you're passionate about building reliable systems at scale and want to be part of a fast-growing startup backed by top investors, this role offers the perfect blend of technical challenge and business impact.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer

Lead operational resilience and system stability
Own debugging workflows and incident response
Design and implement tools for system observability
Reduce incident load and improve developer focus
Build and maintain internal tooling for reliability

Requirements For Site Reliability Engineer

Python

Kubernetes

1+ years of hands-on experience debugging production systems
Strong problem-solving skills and ability to dive into unfamiliar backend codebases
Comfort with Python and Go for reading code and writing small tools/utilities
Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)
Clear, calm communication under pressure — especially during live incidents

Benefits For Site Reliability Engineer

Equity

Visa Sponsorship

Competitive salary + equity
Work with world-class team
High ownership and autonomy
Visa sponsorship available

HappyRobot

AI platform that builds and deploys AI workers to automate communication in the logistics industry.

San Francisco, CA, USA

$120,000 - $220,000

Site Reliability

Senior Software Engineer

In-Person

11 - 50 Employees

1+ year of experience

AI · Logistics

Related Jobs

Senior Site Reliability Engineer

Sinch

Senior Site Reliability Engineer position at Sinch - Remote opportunity in multiple US locations

Senior Site Reliability Engineer

Sinch

Senior Site Reliability Engineer position at Sinch, working remotely to ensure system reliability and performance across distributed infrastructure.

Senior Site Reliability Engineer - CTJ - Top Secret

Microsoft

Senior Site Reliability Engineer position at Microsoft working on Defender products, requiring Top Secret clearance, focusing on cloud security and system reliability.

Senior Site Reliability Engineer

Sinch

Senior Site Reliability Engineer position at Sinch, focusing on maintaining and scaling cloud infrastructure for a global communications platform.

Senior Software Engineer, Site Reliability Tooling

Upstart

Senior SRE Engineer role at Upstart building tooling and automation for infrastructure monitoring and reliability