Senior Site Reliability Engineer

Building the platform that care operations run on, using AI, RTLS, and EHR data to enable self-learning agents to automate workflows in healthcare.
Site Reliability
Senior Software Engineer
Remote
101 - 500 Employees
5+ years of experience
Healthcare · Enterprise SaaS · AI

Description For Senior Site Reliability Engineer

Kontakt.io is revolutionizing healthcare operations with their innovative platform that leverages AI, RTLS, and EHR data to optimize care delivery. As a Senior Site Reliability Engineer, you'll play a crucial role in ensuring the reliability and performance of their cloud-based, real-time platform that serves healthcare facilities with a commitment to 99.99% uptime.

The position offers an opportunity to work on mission-critical systems that directly impact healthcare delivery efficiency. You'll be responsible for designing and implementing self-healing, fault-tolerant systems, managing containerized environments, and developing robust monitoring solutions using cutting-edge technologies like Prometheus, Grafana, and OpenTelemetry.

The role combines technical challenges with meaningful impact - you'll be working on systems that help reduce waste, optimize resources, and improve patient care while delivering 10X ROI to healthcare facilities. You'll join a high-performing team of engineers, AI experts, and healthcare innovators solving real-world challenges.

Key technical aspects include working with AWS cloud infrastructure, Kubernetes orchestration, infrastructure as code using Terraform, and implementing comprehensive observability solutions. The position requires expertise in distributed systems, security compliance (HIPAA, SOC 2), and automated deployment processes.

This remote position offers the chance to work on the East Coast/New York City, collaborating with cross-functional teams to align SRE initiatives with business goals. The role requires 5+ years of experience in SRE or Cloud Infrastructure, with a strong background in scaling high-traffic, mission-critical platforms.

If you're passionate about using technology to improve healthcare operations and want to work with cutting-edge automation and observability tools while ensuring critical healthcare services remain available 24/7, this role offers an excellent opportunity to make a significant impact in the healthcare technology sector.

Last updated 9 hours ago

Responsibilities For Senior Site Reliability Engineer

  • Ensure 99.99% uptime of cloud platform by maintaining highly reliable and resilient infrastructure
  • Design and implement self-healing, fault-tolerant systems
  • Define and maintain SLIs, SLOs, and SLAs
  • Architect and optimize scalable cloud infrastructure (AWS)
  • Improve and manage containerized environments (Kubernetes, Docker)
  • Implement and enhance infrastructure as code (Terraform)
  • Develop monitoring, alerting, and logging system using Prometheus, Grafana, OpenTelemetry, and Datadog
  • Participate in incident response and on-call rotations
  • Conduct blameless postmortems
  • Automate deployment, scaling, and failover mechanisms
  • Contribute to disaster recovery and business continuity planning
  • Work with Product, Engineering, and Infrastructure teams

Requirements For Senior Site Reliability Engineer

Kubernetes
Redis
PostgreSQL
  • 5+ years of experience in Site Reliability Engineering or Cloud Infrastructure
  • Proven success scaling high-traffic, mission-critical platforms in SaaS, IoT, or healthcare
  • Deep expertise in cloud platforms (AWS), Kubernetes, and distributed systems
  • Strong background in monitoring, logging, and observability with Prometheus, OpenTelemetry
  • Deep knowledge of CI/CD automation, GitOps, and infrastructure as code (Terraform)
  • Strong understanding of network security, access management, and compliance frameworks (HIPAA, SOC 2)
  • Experience with healthcare IT, including EHR data, FHIR, and HL7 interoperability (bonus)
  • Expertise in real-time distributed systems, event-driven architectures, or large-scale data pipelines (bonus)

Interested in this job?

Jobs Related To Kontakt.io Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Kontakt.io, focusing on maintaining 99.99% uptime for healthcare operations platform using AWS, Kubernetes, and advanced monitoring tools.

Sr. Site Reliability Engineer - Top Secret Clearance

Senior Site Reliability Engineer position at SpaceX, requiring Top Secret clearance, focusing on infrastructure automation and DevOps practices for space flight systems.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems, requiring 5+ years of software development experience and strong system design skills.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems, requiring 5+ years of software development experience.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Pepperstone, focusing on building and maintaining highly available cloud infrastructure for a global fintech company.