Taro Logo

Senior Site Reliability Engineer (AWS, AI/ML, & APM)

A GovTech company providing cloud-based solutions for government communications and digital services.
United States
$70,000 - $80,000
DevOps
Senior Software Engineer
Remote
1,000 - 5,000 Employees
5+ years of experience
AI · Enterprise SaaS · Government

Description For Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Granicus, a leading GovTech company serving over 5,500 government agencies, is seeking a Senior Site Reliability Engineer to join their SRE team. This role combines traditional SRE responsibilities with cutting-edge AI/ML infrastructure management, making it an exciting opportunity for experienced engineers passionate about public sector technology.

The position offers a unique chance to work on systems that directly impact government-citizen relationships across the US, UK, Australia, New Zealand, and Canada. As an SRE, you'll be responsible for maintaining and improving the reliability of cloud-based solutions that facilitate government communications, website design, meeting management, and digital services.

The ideal candidate will bring 5+ years of SRE experience, with particular expertise in AWS and AI/ML infrastructure. You'll work in a remote-first environment with a globally distributed team, contributing to systems that serve over 300 million citizen subscribers. The role combines technical challenges with meaningful public sector impact, supported by a competitive benefits package and an inclusive company culture.

Key technical areas include AWS services, AI/ML operations, monitoring systems like ELK Stack, and automation using various programming languages. The position offers competitive compensation, comprehensive benefits, and the opportunity to work on technology that genuinely improves government services and citizen engagement.

Last updated 13 hours ago

Responsibilities For Senior Site Reliability Engineer (AWS, AI/ML, & APM)

  • Provide production support on-call rotation
  • Work on customer and internal engineering team tickets
  • Monitor and maintain system health and performance
  • Develop and maintain automation scripts and tools
  • Assist in troubleshooting and resolving incidents
  • Participate in system improvements for reliability and scalability
  • Collaborate with software engineers on application requirements
  • Create and maintain documentation
  • Assist in capacity planning
  • Implement security best practices

Requirements For Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Python
Go
Java
Linux
Kubernetes
  • 5+ years in site reliability engineering, system administration, or similar role
  • Experience supporting AI/ML infrastructure
  • Expertise in Linux/Unix systems and cloud platforms (AWS, Azure, or Google Cloud)
  • Strong proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)
  • Experience with ELK Stack for centralized logging, monitoring, and observability
  • Experience with configuration management tools (Ansible, Chef, Puppet)
  • Exposure to AI/ML toolchains including AWS Bedrock, SageMaker, and LLMOps frameworks

Benefits For Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
  • Flexible Time Off
  • Medical Insurance (includes 100% paid option)
  • Dental & Vision Insurance
  • 401(k) plan with matching contribution
  • Paid Parental Leave
  • Employer-paid Short and Long Term Disability Insurance
  • Group Term Life Insurance and AD&D Insurance
  • Group legal coverage

Interested in this job?

Jobs Related To Granicus Senior Site Reliability Engineer (AWS, AI/ML, & APM)