Principal Site Reliability Engineer - Federal Team

Saviynt

Saviynt provides AI-powered identity platform that manages and governs human and non-human access to organizations' applications, data, and business processes.

Atlanta, GA, USA

Site Reliability

Principal Software Engineer

Hybrid

10+ years of experience

Enterprise SaaS · Cybersecurity

Job Description

Saviynt, a leader in identity security, is seeking a Principal Site Reliability Engineer for their Federal Team. This role sits within Saviynt Labs, the organization responsible for designing, building, and running their enterprise identity solutions. The position combines deep technical expertise in cloud platforms (AWS, GCP, Azure) with a focus on maintaining and improving system reliability and performance.

The ideal candidate will be responsible for implementing comprehensive monitoring and alerting systems, ensuring high availability, and maintaining performance standards. They will work with cutting-edge technologies including Kubernetes, cloud platforms, and modern observability tools like Prometheus and Grafana. The role requires a strong background in software development, particularly with Python, NodeJS, or Java, combined with extensive experience in cloud operations and monitoring.

This is an opportunity to join a high-growth Platform as a Service company that's at the forefront of identity security. The position offers significant technical challenges, working with distributed systems at scale, and the chance to directly impact customer success. The role requires U.S. citizenship and involves working with federal clients, adding an extra layer of responsibility and security requirements.

Working at Saviynt means joining a company that's focused on innovation and engineering excellence, with their solutions trusted by Fortune 500 companies and government institutions. The company offers a welcoming and positive work environment, with opportunities for tremendous growth and learning through challenging yet rewarding work. The hybrid work environment provides flexibility while maintaining collaborative opportunities with cross-functional teams.

Last updated a day ago

Responsibilities For Principal Site Reliability Engineer - Federal Team

Implement monitoring and alerting systems to guarantee high availability and performance
Collaborate with engineering and operations teams to identify critical components
Design and implement strategies for system uptime and reliability
Run the production environment by monitoring availability
Build software and systems to monitor platform infrastructure
Monitor and improve reliability, quality, and time-to-market
Provide operational support for large-scale distributed software applications
Gather and analyze metrics for performance tuning and fault finding

Requirements For Principal Site Reliability Engineer - Federal Team

Python

Node.js

Java

Kubernetes

U.S. Citizenship required
Master's Degree in Engineering or bachelor's degree with 7+ years of software engineering experience
10+ years professional experience in Monitoring and Alerting roles on major cloud platforms
4+ years experience in Cloud development and observability skills
Experience with building resilient platforms in AWS cloud environments
3+ years of software development with Python, NodeJS, or Java
Hands-on experience with container orchestration and Kubernetes
Experience with logging and monitoring tools (Prometheus, Grafana, Datadog, AWS Cloudwatch)
Experience implementing advanced observability practices at scale