Principal Site Reliability Engineer - Federal Team

Saviynt

Saviynt provides AI-powered identity platform that manages and governs human and non-human access to organizations' applications, data, and business processes.

Atlanta, GA, USA

Site Reliability

Principal Software Engineer

Hybrid

10+ years of experience

Enterprise SaaS · Cybersecurity

Job Description

Saviynt, a leader in identity security, is seeking a Principal Site Reliability Engineer for their Federal Team. This role sits within Saviynt Labs, the organization responsible for designing, building, and running their enterprise identity solutions. The position combines deep technical expertise in cloud platforms (AWS, GCP, Azure) with a focus on maintaining and improving system reliability and performance.

The ideal candidate will be responsible for implementing comprehensive monitoring and alerting systems, ensuring high availability, and maintaining performance standards. They'll work with cutting-edge technologies including Kubernetes, cloud platforms, and modern observability tools like Prometheus and Grafana. The role requires a strong background in software development, particularly with Python, NodeJS, or Java, combined with extensive experience in cloud operations and monitoring.

This is an opportunity to join a high-growth Platform as a Service company that's using AI to revolutionize identity security. The position offers significant technical challenges, working with distributed systems at scale, and the chance to directly impact how Fortune 500 companies and government institutions manage digital security. The role requires U.S. citizenship and comes with the responsibility of maintaining strict security and privacy standards.

Working in a hybrid environment in Atlanta, you'll collaborate with cross-functional teams to drive engineering excellence and innovation. Saviynt offers a welcoming and positive work environment where you'll experience tremendous growth through challenging yet rewarding work that directly impacts customers. The company is committed to equal opportunity employment and welcomes diverse perspectives and backgrounds.

Last updated 6 days ago

Responsibilities For Principal Site Reliability Engineer - Federal Team

Implement monitoring and alerting systems to guarantee high availability and performance
Collaborate with engineering and operations teams to identify critical components
Design and implement strategies for system uptime and reliability
Run the production environment by monitoring availability
Build software and systems to monitor platform infrastructure
Monitor and improve reliability, quality, and time-to-market
Provide operational support for large-scale distributed software applications
Gather and analyze metrics for performance tuning and fault finding

Requirements For Principal Site Reliability Engineer - Federal Team

Python

Node.js

Java

Kubernetes

U.S. Citizenship required
Master's Degree in Engineering or bachelor's degree with 7+ years of software engineering experience
10+ years professional experience in Monitoring and Alerting roles on major cloud platforms
4+ years experience in Cloud development and observability skills
Experience with building resilient platforms in AWS cloud environments
3+ years of software development with Python, NodeJS, or Java
Hands-on experience with container orchestration and Kubernetes
Experience with logging and monitoring tools (Prometheus, Grafana, Datadog, AWS Cloudwatch)
Experience implementing advanced observability practices at scale