Site Reliability Engineer

HighLevel

HighLevel is a cloud-based, all-in-one white-label marketing and sales platform empowering marketing agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth.

India Gate, New Delhi, Delhi, India

Site Reliability

Mid-Level Software Engineer

Remote

1,000 - 5,000 Employees

4+ years of experience

Enterprise SaaS

Description For Site Reliability Engineer

HighLevel, a rapidly growing SaaS platform processing over 15 billion API hits daily, is seeking a Site Reliability Engineer to join their global team of 1,500+ members across 15+ countries. The platform manages 470 terabytes of data and supports over 1 million domain names through 250 micro-services. This role is crucial for maintaining the platform's reliability that serves over 2 million businesses worldwide.

As an SRE, you'll be responsible for ensuring system availability, performance, and scalability. You'll work with cutting-edge technologies including GCP, AWS, Kubernetes, and various monitoring tools like Prometheus and Grafana. The role requires expertise in infrastructure as code, containerization, and strong programming skills in Python.

The position offers the opportunity to work in a remote-first environment while making a significant impact on a platform that facilitates over 1.5 billion messages and generates 200 million leads monthly. You'll be part of a global community focused on innovation and collaboration, working with modern cloud infrastructure and contributing to the growth of businesses worldwide.

This is an ideal role for an experienced SRE who wants to work with large-scale systems, implement best practices in observability and automation, and be part of a company that's transforming how businesses manage their digital presence. The role combines technical challenges with real-world impact, supporting millions of businesses in their growth journey.

Last updated 3 days ago

Responsibilities For Site Reliability Engineer

Develop and improve observability using monitoring, logging, tracing, and alerting tools
Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA
Collaborate with developers to enhance application reliability, scalability, and performance
Drive cost optimisation efforts in cloud environments
Monitor multiple databases (MongoDB, Redis, ES, Queue based etc.)

Requirements For Site Reliability Engineer

Python

Kubernetes

MongoDB

Redis

4+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Hands-on experience with GCP and AWS
Experience with Terraform, Helm, or equivalent tools
Experience with Docker, Kubernetes (GKE)
Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools
Proficiency in Python, Bash, or Shell scripting
Experience with Jenkins, GitHub Actions, ArgoCD, or similar tools
Experience with on-call rotations, SLOs, SLIs, SLAs
Experience in monitoring MongoDB, Redis, ES, Queue based systems

HighLevel

HighLevel is a cloud-based, all-in-one white-label marketing and sales platform empowering marketing agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth.

India Gate, New Delhi, Delhi, India

Site Reliability

Mid-Level Software Engineer

Remote

1,000 - 5,000 Employees

4+ years of experience

Enterprise SaaS

Interested in this job?

Jobs Related To HighLevel Site Reliability Engineer

Software Engineer - Incident Management

Datadog

Software Engineer position at Datadog focusing on incident management, building tools and processes to improve system reliability and incident response across the organization.

ASE -Site Reliability Engineer

Apple

Site Reliability Engineer role at Apple focused on distributed systems and coordination services, offering competitive pay and comprehensive benefits.

Site reliability/Platform Engineer/Sys Dev Engineer, ESC

Amazon

AWS System Development Engineer position focusing on cloud infrastructure management, combining software development with systems engineering to maintain and improve AWS's global network infrastructure.

Site Reliability Engineer, ESC Managed Operations

Amazon

AWS seeks Site Reliability Engineer for European Sovereign Cloud launch, focusing on high-availability services and operations management with strong emphasis on security and performance.

Software Developer III, Site Reliability Development, Google Cloud

Google

Site Reliability Development Engineer position at Google Cloud, focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.