Taro Logo

Site Reliability Engineer

Orgvue is an organisational design and planning platform that empowers businesses to transform their workforce by understanding work and skills.
Site Reliability
Principal Software Engineer
Hybrid
101 - 500 Employees
8+ years of experience
Enterprise SaaS

Description For Site Reliability Engineer

Orgvue, headquartered in London with global offices, is seeking a Principal Site Reliability Engineer to join their team. The role combines technical leadership with hands-on expertise in AWS and Kubernetes infrastructure. As a senior technical leader, you'll be responsible for scaling and hardening their cloud infrastructure while building a world-class reliability culture. The position involves working across product, platform, and operations teams to ensure system reliability, observability, and resilience at scale. You'll be instrumental in defining SLOs, implementing cloud infrastructure strategies, and mentoring teams on SRE practices. The company offers a comprehensive benefits package including hybrid working, healthcare, wellbeing programs, and various lifestyle perks. This is an excellent opportunity for an experienced SRE leader who combines technical expertise with strategic vision and strong communication skills. The role is perfect for someone passionate about building robust, scalable systems while fostering a culture of operational excellence.

Last updated a day ago

Responsibilities For Site Reliability Engineer

  • Define and enforce SLOs, SLIs, and error budgets across critical services
  • Craft and implement cloud infrastructure and tooling strategy
  • Work across organization to level up SRE practices
  • Implement robust observability metrics, logs & traces
  • Guide the team in building automated, self-healing systems
  • Own and evolve incident response processes
  • Mentor engineers on best practices
  • Drive Infrastructure as Code using Terraform, Kubernetes, CloudFormation and GitOps practices
  • Collaborate with security, DevOps, and software teams
  • Evaluate and introduce tools for performance and reliability improvement

Requirements For Site Reliability Engineer

Kubernetes
Linux
  • Demonstrable experience leading SRE transformations
  • Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
  • Strong experience with AWS core services
  • Expert in Infrastructure as Code using tools such as Terraform
  • Strong background in observability: metrics, visualization, logging, and tracing
  • Understanding of automation, SDLC, CI/CD pipelines, deployment automation
  • Proven experience with incident management, disaster recovery planning, root cause analysis

Benefits For Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
  • Hybrid working - 1+ days a week in the London office
  • Sanctus Coaching
  • Virtual fitness sessions
  • Wellbeing webinars
  • Annual Wellbeing day
  • Subsidised Gym Membership
  • Private Medical Insurance (including Dental and Vision)
  • Life Assurance
  • 25 days holiday (increasing to 30 days)
  • Summer Fridays (half-day Fridays for July and August)
  • 5% employer pension contribution
  • Season ticket Loan
  • Cycle to Work Scheme
  • Annual Discretionary Bonus

Interested in this job?

Jobs Related To Orgvue Site Reliability Engineer

Principal Software Engineer - Site Reliability Engineering

Lead SRE position at Roblox focusing on platform reliability, system scalability, and team leadership, offering competitive compensation and comprehensive benefits.

Principal AI Infrastructure SRE Engineer

Principal AI Infrastructure Site Reliability Engineering role at NVIDIA focusing on maintaining and optimizing AI infrastructure systems.

Principal AI Infrastructure SRE Engineer

Principal AI Infrastructure Site Reliability Engineering role at NVIDIA, focusing on maintaining and optimizing AI infrastructure systems.

Systems Engineering (Principal or Architect)

Principal/Architect Systems Engineering role at Salesforce focusing on reliability engineering and post-incident analysis, offering $230,800-$384,100 in San Francisco.

Director, Software Engineering, Site Reliability

Lead a team of 40+ Site Reliability Engineers at LinkedIn, driving infrastructure reliability and automation for critical distributed systems while shaping technical strategy and culture.