Taro Logo

Site Reliability Engineer

Fast-growing Series A startup providing the most widely-used platform for sophisticated AI agents and chatbots.
Site Reliability
Mid-Level Software Engineer
In-Person
51 - 100 Employees
3+ years of experience
AI · Enterprise SaaS

Description For Site Reliability Engineer

Botpress Technologies Inc. is at the forefront of the AI revolution as the 3rd fastest-growing B2B AI start-up worldwide. With over 1 million AI agents deployed, 700,000+ platform users, and trusted by 35% of Fortune 500 companies, Botpress enables companies to build and deploy advanced AI agents that go beyond conversation into real business logic.

The Site Reliability Engineer role is a critical position within the product team, focused on ensuring the platform's stability, scalability, and security. This hands-on engineering role requires expertise in cloud systems (AWS), with a strong emphasis on observability, uptime, and automation. The ideal candidate will have 3+ years of experience in SRE/DevOps roles, deep AWS knowledge, and proficiency with Linux, containerization, and modern DevOps tools.

The role involves architecting and maintaining scalable infrastructure, optimizing CI/CD pipelines, improving system observability, and managing incident response. You'll work closely with engineering teams to enhance software delivery processes while maintaining high reliability and security standards. The position offers comprehensive benefits including health insurance, educational funding, and a collaborative startup environment.

As part of a Series A startup that takes a deliberate approach to growth - product-led, capital-efficient, and highly focused - you'll have the opportunity to shape the future of enterprise AI and build technology that will define the next era of business automation. The company culture emphasizes ownership, innovation, and continuous learning, making it an ideal environment for talented individuals who want to make a significant impact in the AI industry.

Last updated 3 days ago

Responsibilities For Site Reliability Engineer

  • Architect and maintain scalable infrastructure
  • Design and optimize CI/CD pipelines to ensure smooth delivery of changes
  • Improve observability through advanced monitoring, logging, and alerting
  • Own incident response and support the engineering team in diagnosing and resolving issues
  • Build systems that increase platform reliability, resiliency, and uptime
  • Enforce security best practices across environments and workflows
  • Manage infrastructure as code using tools like Terraform or Pulumi
  • Document operational procedures, disaster recovery plans, and system runbooks

Requirements For Site Reliability Engineer

Linux
Kubernetes
  • 3+ years in SRE, DevOps, or infrastructure engineering roles
  • Deep experience with AWS cloud infrastructure and services (ECS, S3, Lambda, RDS)
  • Comfortable with Linux systems, containerization, and orchestration (e.g. Docker, Kubernetes)
  • Proficient in CI/CD tools, infrastructure-as-code, and automation scripting
  • Familiar with incident management and site reliability principles
  • Experience with observability stacks like Datadog, Grafana, Prometheus, etc.
  • Strong communicator and collaborator across technical teams
  • Calm and systematic under pressure when production issues arise
  • Bonus: Previous experience in a fast-paced startup or SaaS environment

Benefits For Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
Education Budget
Parental Leave
  • Work at one of Canada's fastest-growing AI start-ups
  • Work with a talented and passionate team
  • 4 weeks of vacation
  • Paid sick and parental leave
  • Comprehensive health, dental, vision, travel, and life insurance
  • Funding for education and skills improvement
  • Fully-stocked fridge and cupboard – we take snacks seriously
  • Your own desk – no 'hot-desk'-style sign-up systems
  • A vibrant office community, including weekly socials

Interested in this job?

Jobs Related To Botpress Technologies Inc. Site Reliability Engineer