Site Reliability Engineer

HighLevel is a cloud-based, all-in-one white-label marketing and sales platform empowering marketing agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth.
India Gate, New Delhi, Delhi, India
Site Reliability
Mid-Level Software Engineer
Remote
1,000 - 5,000 Employees
4+ years of experience
Enterprise SaaS

Description For Site Reliability Engineer

HighLevel, a rapidly growing SaaS platform processing over 15 billion API hits daily, is seeking a Site Reliability Engineer to join their global team of 1,500+ members across 15+ countries. The platform manages 470 terabytes of data and supports over 1 million domain names through 250 micro-services. This role is crucial for maintaining the platform's reliability that serves over 2 million businesses worldwide.

As an SRE, you'll be responsible for ensuring system availability, performance, and scalability. You'll work with cutting-edge technologies including GCP, AWS, Kubernetes, and various monitoring tools like Prometheus and Grafana. The role requires expertise in infrastructure as code, containerization, and strong programming skills in Python.

The position offers the opportunity to work in a remote-first environment while making a significant impact on a platform that facilitates over 1.5 billion messages and generates 200 million leads monthly. You'll be part of a global community focused on innovation and collaboration, working with modern cloud infrastructure and contributing to the growth of businesses worldwide.

This is an ideal role for an experienced SRE who wants to work with large-scale systems, implement best practices in observability and automation, and be part of a company that's transforming how businesses manage their digital presence. The role combines technical challenges with real-world impact, supporting millions of businesses in their growth journey.

Last updated 3 days ago

Responsibilities For Site Reliability Engineer

  • Develop and improve observability using monitoring, logging, tracing, and alerting tools
  • Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA
  • Collaborate with developers to enhance application reliability, scalability, and performance
  • Drive cost optimisation efforts in cloud environments
  • Monitor multiple databases (MongoDB, Redis, ES, Queue based etc.)

Requirements For Site Reliability Engineer

Python
Kubernetes
MongoDB
Redis
  • 4+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Hands-on experience with GCP and AWS
  • Experience with Terraform, Helm, or equivalent tools
  • Experience with Docker, Kubernetes (GKE)
  • Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools
  • Proficiency in Python, Bash, or Shell scripting
  • Experience with Jenkins, GitHub Actions, ArgoCD, or similar tools
  • Experience with on-call rotations, SLOs, SLIs, SLAs
  • Experience in monitoring MongoDB, Redis, ES, Queue based systems

Interested in this job?

Jobs Related To HighLevel Site Reliability Engineer

Software Engineer - Incident Management

Software Engineer position at Datadog focusing on incident management, building tools and processes to improve system reliability and incident response across the organization.

ASE -Site Reliability Engineer

Site Reliability Engineer role at Apple focused on distributed systems and coordination services, offering competitive pay and comprehensive benefits.

Site reliability/Platform Engineer/Sys Dev Engineer, ESC

AWS System Development Engineer position focusing on cloud infrastructure management, combining software development with systems engineering to maintain and improve AWS's global network infrastructure.

Site Reliability Engineer, ESC Managed Operations

AWS seeks Site Reliability Engineer for European Sovereign Cloud launch, focusing on high-availability services and operations management with strong emphasis on security and performance.

Software Developer III, Site Reliability Development, Google Cloud

Site Reliability Development Engineer position at Google Cloud, focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.