Taro Logo

Site Reliability Engineer

AI platform that builds and deploys AI workers to automate communication in the logistics industry.
$120,000 - $220,000
Site Reliability
Senior Software Engineer
In-Person
11 - 50 Employees
1+ year of experience
AI · Logistics

Job Description

HappyRobot is revolutionizing the logistics industry with their AI communication platform. As a Site Reliability Engineer, you'll play a crucial role in scaling operational resilience and maintaining system stability.

The company has successfully raised a Series A round from a16z and YC, demonstrating strong market validation and growth potential. Their AI workers automate various communication channels including phone calls, emails, and messages, primarily serving freight brokers, 3PLs, freight forwarders, and other supply chain enterprises.

In this role, you'll:

  • Lead the charge on operational resilience and system stability
  • Own debugging workflows and incident response
  • Design and implement tools to improve system observability
  • Help transition from reactive to proactive operations
  • Build and maintain internal tooling for reliability
  • Work with cutting-edge AI technology in a fast-paced environment

The ideal candidate should have hands-on experience with production systems, strong problem-solving skills, and proficiency in Python and Go. You'll be working with modern observability tools like Datadog, Prometheus, and Sentry.

This is a high-impact opportunity to shape reliability practices at a growing AI startup. You'll work alongside a world-class team of engineers and have significant autonomy in your role. The company offers competitive compensation including equity, and the chance to work on innovative AI technology that's transforming the logistics industry.

If you're passionate about building reliable systems at scale and want to be part of a fast-growing startup backed by top investors, this role offers the perfect blend of technical challenge and business impact.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer

  • Lead operational resilience and system stability
  • Own debugging workflows and incident response
  • Design and implement tools for system observability
  • Reduce incident load and improve developer focus
  • Build and maintain internal tooling for reliability

Requirements For Site Reliability Engineer

Python
Go
Kubernetes
  • 1+ years of hands-on experience debugging production systems
  • Strong problem-solving skills and ability to dive into unfamiliar backend codebases
  • Comfort with Python and Go for reading code and writing small tools/utilities
  • Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)
  • Clear, calm communication under pressure — especially during live incidents

Benefits For Site Reliability Engineer

Equity
Visa Sponsorship
  • Competitive salary + equity
  • Work with world-class team
  • High ownership and autonomy
  • Visa sponsorship available

Related Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Sinch - Remote opportunity in multiple US locations

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Sinch, working remotely to ensure system reliability and performance across distributed infrastructure.

Senior Site Reliability Engineer - CTJ - Top Secret

Senior Site Reliability Engineer position at Microsoft working on Defender products, requiring Top Secret clearance, focusing on cloud security and system reliability.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Sinch, focusing on maintaining and scaling cloud infrastructure for a global communications platform.

Senior Software Engineer, Site Reliability Tooling

Senior SRE Engineer role at Upstart building tooling and automation for infrastructure monitoring and reliability