Lead Site Reliability Engineer (Observability)

Xero provides cloud-based accounting software for small businesses, automating routine tasks and connecting businesses with data and advisors.
Melbourne VIC, AustraliaSydney NSW, AustraliaBrisbane QLD, Australia
Site Reliability
Staff Software Engineer
Hybrid
1,000 - 5,000 Employees
8+ years of experience
Enterprise SaaS

Description For Lead Site Reliability Engineer (Observability)

Xero is seeking a Lead Site Reliability Engineer to spearhead their observability strategy and enhance engineering capabilities. This role combines hands-on technical leadership with strategic influence, focusing on building and implementing sophisticated monitoring and remediation toolsets. The position is part of the global Site Reliability Engineering team, which operates across New Zealand, Australia, and the USA.

The ideal candidate will take ownership of shaping observability at Xero, driving the adoption of OpenTelemetry and modern solutions to empower teams in building reliable, high-performing services. This role requires deep expertise in observability concepts, experience with various monitoring tools, and strong programming skills in languages like C#, JavaScript, Golang, or Python.

The position offers an opportunity to make a lasting impact on Xero's engineering practices, working closely with Product Managers, Team Leads, and Principal Engineers. The role involves both technical leadership and hands-on implementation, requiring experience in incident response, agile methodologies, and stakeholder management.

Xero offers an attractive benefits package including generous paid leave, health insurance, life insurance, income protection, and an Employee Share Plan. The company promotes a flexible work environment and provides strong support for career development and well-being. This is an excellent opportunity for a seasoned SRE professional looking to shape the reliability and observability practices of a major cloud-based accounting software provider.

Last updated 8 hours ago

Responsibilities For Lead Site Reliability Engineer (Observability)

  • Drive observability strategy and uplift engineering capabilities
  • Design and implement observability solutions
  • Guide technical design and ensure adherence to architectural principles
  • Identify and address failure patterns to enhance system reliability
  • Define and evolve observability and reliability standards
  • Participate in hiring and recruitment
  • Provide hands-on technical mentorship
  • Work closely with Product Managers, Team Leads, and Principal Engineers
  • Participate in on-call rotations

Requirements For Lead Site Reliability Engineer (Observability)

Kubernetes
Go
Python
JavaScript
  • Deep knowledge of reliability and observability concepts
  • Experience implementing observability in large, distributed cloud environments (AWS)
  • Experience with monitoring tools like Prometheus, VictoriaMetrics, Jaeger, New Relic, Datadog, Dynatrace
  • Proficiency in programming languages such as C#, JavaScript, Golang, or Python
  • Experience in incident response and resolving production incidents
  • Experience in agile software development environments
  • Strong stakeholder engagement and influence skills
  • Experience managing observability platforms at scale

Benefits For Lead Site Reliability Engineer (Observability)

Medical Insurance
Vision Insurance
Dental Insurance
Mental Health Assistance
Parental Leave
Equity
  • Generous paid leave
  • Health insurance
  • Life insurance
  • Income protection
  • Wellbeing and sports programmes
  • Employee resource groups
  • 26 weeks paid parental leave for primary caregivers
  • Employee Share Plan
  • Flexible working
  • Career development
  • Employee Assistance Program
  • Mental health care access for employees and family

Interested in this job?

Jobs Related To Xero Lead Site Reliability Engineer (Observability)

Lead Site Reliability Engineer (Product SRE)

Lead Site Reliability Engineer position at Xero, focusing on driving reliability, observability, and high-performing services across product teams.

Lead Engineer, Product Site Reliability Engineer

Lead Engineer position for Product Site Reliability Engineering at Xero, focusing on building and leading SRE teams to ensure system reliability and observability.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Microsoft Security, focusing on building and managing critical infrastructure for red team operations with emphasis on security and automation.

Cloud Site Reliability Engineer I

Cloud Site Reliability Engineer I position at Zafin, responsible for ensuring seamless operation of cloud infrastructure and applications.

Cloud Site Reliability Engineer II

Lead Cloud Site Reliability Engineer position at Zafin, requiring 12+ years of experience in cloud operations, focusing on Azure infrastructure and container orchestration for banking solutions.