Taro Logo

Site Reliability Engineer

A product-focused startup building tools that help teams make better decisions through great research, with a team of 14 engineers.
Site Reliability
Senior Software Engineer
Remote
11 - 50 Employees
4+ years of experience
Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Site Reliability Engineer

Great Question, a product-focused startup with 14 engineers, is seeking their first dedicated DevOps/Infra hire to take ownership of platform health, reliability, and scalability. This Site Reliability Engineer role offers end-to-end ownership of critical infrastructure systems and will partner directly with the engineering team to improve systems and reduce toil.

The role encompasses various crucial areas including observability, reliability, infrastructure management, capacity planning, developer experience, security compliance, and cloud cost optimization. You'll be responsible for maintaining service SLOs, improving incident response, managing Terraform infrastructure, and leading AWS migrations.

As a foundational hire, you'll have the opportunity to shape the systems and culture of how the company builds and runs software. The position offers clear growth paths into platform leadership, Head of Infra/SRE, or Principal Engineer roles as the company expands. The technical stack includes AWS, Terraform, GitHub Actions, Docker, Kubernetes, Datadog, PostgreSQL, Redis, and Rails.

The ideal candidate should have 4-8+ years of experience in DevOps or SRE roles, strong AWS expertise, and proficiency with infrastructure-as-code tools. You'll work in a high-autonomy environment with a team that values thoughtfulness, speed, and care. The role offers significant impact potential, trust in decision-making, and opportunities to grow with the company.

This remote position combines technical challenges with strategic platform development, making it perfect for someone who views infrastructure as a product and wants to build lasting foundations for a growing company. You'll have support from leadership while maintaining the freedom to chart your own path as the company grows.

Last updated 11 days ago

Responsibilities For Site Reliability Engineer

  • Define and maintain service SLOs, dashboards, and alerts
  • Improve incident detection and response
  • Lead incident postmortems and manage follow-up actions
  • Maintain and improve Terraform-managed infrastructure
  • Lead staging infrastructure migration to AWS
  • Optimize use of monitoring tools
  • Identify and address performance bottlenecks
  • Implement automated scaling strategies
  • Increase CI/CD pipeline reliability and performance
  • Implement SOC2 compliance protocols
  • Monitor and optimize cloud spend

Requirements For Site Reliability Engineer

PostgreSQL
Redis
Kubernetes
  • 4-8+ years of experience in DevOps, SRE, or Infrastructure roles
  • Hands-on AWS experience (EC2, RDS, VPCs, etc.)
  • Experience with Terraform, GitHub Actions, Docker, and PostgreSQL
  • Track record of improving observability and reducing incident response times
  • Experience in high-autonomy, high-ownership environments
  • Cost-conscious mindset for infrastructure and cloud spend
  • Ability to build leverage tools for engineers

Interested in this job?