Site Reliability Engineer

Cover Genius is a Series E insurtech that protects the global customers of the world's largest digital companies including Booking Holdings, Intuit, Uber, Hopper, Ryanair, and more.
Site Reliability
Senior Software Engineer
Hybrid
AI · Finance
This job posting may no longer be active. You may be interested in these related jobs instead:
Senior Software Engineer - Site Reliability Engineering

Senior SRE position at Roblox focusing on building resilient systems, automation tools, and monitoring solutions for a gaming platform serving millions of users.

Senior Site Reliability Engineer (Distributed Systems)

Senior Site Reliability Engineer position at Workday focusing on distributed systems and infrastructure reliability.

Senior Software Engineer, Site Reliability Tooling

Senior SRE Engineer role at Upstart focusing on building tooling and automation for monitoring infrastructure health and creating reliable systems.

Service Reliability Engineer

Senior Service Reliability Engineer position at Jobgether, offering remote work across Asia, focusing on system stability and technical problem-solving with competitive benefits and equity.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Jobgether, focusing on cloud infrastructure, Kubernetes, and AWS services with comprehensive benefits and remote work flexibility.

Description For Site Reliability Engineer

Cover Genius is a Series E insurtech that protects global customers of major digital companies. As a Site Reliability Engineer, you'll ensure reliable operation of production systems, working across technical areas to automate and improve platforms. Key responsibilities include:

  • Analyzing, testing, and modifying systems for reliability and performance
  • Developing observability tools and dashboards
  • Implementing automation tools, CI/CD pipelines, and reducing toil
  • Troubleshooting production issues
  • Applying AWS and GCP knowledge to maintain cloud infrastructure
  • Collaborating with Software Engineers to improve tools and procedures
  • Developing documentation and runbooks
  • Optimizing computing infrastructure costs

Requirements:

  • Understanding of SRE principles and best practices
  • Experience with modern observability tools (ELK/EFK, Prometheus, Grafana)
  • Scripting skills (Bash, Python, Go)
  • Experience with infrastructure as code (Terraform, Cloudformation)
  • Container technology knowledge (Docker, Kubernetes)
  • Linux experience
  • Networking and system architecture understanding
  • AWS/GCP knowledge
  • Bachelor's degree in Computer Science/Engineering or equivalent experience
  • Strong communication and documentation skills
  • Self-motivated learner with attention to detail

Join a diverse team across 20+ countries, recognized as the #1 fastest-growing company in APAC by the Financial Times in 2020. Be part of an innovative company that values being bold, authentic, purposeful, and inspired.

Last updated 7 months ago

Responsibilities For Site Reliability Engineer

  • Analyze, test and modify systems to improve reliability and optimize performance particularly at an architectural/infrastructure level
  • Develop and maintain observability tooling and dashboards
  • Implement automation tools and frameworks, CI/CD pipelines, Reduce toil
  • Troubleshoot production issues and coordinate with the development team to streamline code deployments
  • Apply AWS and GCP knowledge and skills to create & maintain cloud infrastructure for software projects
  • Design, develop and implement software integrations
  • Collaborate with Software Engineers and other team members with the goal of improving engineering tools, systems, procedures and data security
  • Develop and maintain design and troubleshooting documentation and runbooks
  • Optimize and control costs of the company's computing infrastructure

Requirements For Site Reliability Engineer

Linux
Python
Go
Kubernetes
  • Understanding of SRE Principles and best practices
  • Experience using & configuring modern observability tools such as ELK/EFK, Prometheus, Grafana
  • Comfortable scripting & developing internal tooling with Bash and at least one programming language (e.g. python, go)
  • Experience working with infrastructure & configuration as code tools such as Terraform, Cloudformation, Chef, Puppet etc.
  • Experienced with container technology such as Docker and Ideally experienced with using and managing Kubernetes clusters
  • Experience working with Linux
  • Solid understanding of networking and system architecture
  • Solid understanding of how to deploy, scale and monitor web applications and databases
  • Good knowledge of AWS and/or GCP platforms and associated best practices
  • Bachelor Degree in Computer Science/Engineering or equivalent practical experience
  • Strong communication and documentation skills
  • Curious and self motivated learner
  • Professional approach
  • Good team member
  • Organisational and time management skills
  • Excellent attention to detail
  • Positive approach to change

Interested in this job?