Taro Logo

Principal Site Reliability Engineer - Federal Team

Saviynt provides AI-powered identity platform that manages and governs human and non-human access to organizations' applications, data, and business processes.
Site Reliability
Principal Software Engineer
Hybrid
10+ years of experience
Enterprise SaaS · Cybersecurity

Job Description

Saviynt, a leader in identity security, is seeking a Principal Site Reliability Engineer for their Federal Team. This role sits within Saviynt Labs, the organization responsible for designing, building, and running their enterprise identity solutions. The position combines deep technical expertise in cloud platforms (AWS, GCP, Azure) with a focus on maintaining and improving system reliability and performance.

The ideal candidate will be responsible for implementing comprehensive monitoring and alerting systems, ensuring high availability, and maintaining performance standards. They'll work with cutting-edge technologies including Kubernetes, cloud platforms, and modern observability tools like Prometheus and Grafana. The role requires a strong background in software development, particularly with Python, NodeJS, or Java, combined with extensive experience in cloud operations and monitoring.

This is an opportunity to join a high-growth Platform as a Service company that's using AI to revolutionize identity security. The position offers significant technical challenges, working with distributed systems at scale, and the chance to directly impact how Fortune 500 companies and government institutions manage digital security. The role requires U.S. citizenship and comes with the responsibility of maintaining strict security and privacy standards.

Working in a hybrid environment in Atlanta, you'll collaborate with cross-functional teams to drive engineering excellence and innovation. Saviynt offers a welcoming and positive work environment where you'll experience tremendous growth through challenging yet rewarding work that directly impacts customers. The company is committed to equal opportunity employment and welcomes diverse perspectives and backgrounds.

Last updated 6 days ago

Responsibilities For Principal Site Reliability Engineer - Federal Team

  • Implement monitoring and alerting systems to guarantee high availability and performance
  • Collaborate with engineering and operations teams to identify critical components
  • Design and implement strategies for system uptime and reliability
  • Run the production environment by monitoring availability
  • Build software and systems to monitor platform infrastructure
  • Monitor and improve reliability, quality, and time-to-market
  • Provide operational support for large-scale distributed software applications
  • Gather and analyze metrics for performance tuning and fault finding

Requirements For Principal Site Reliability Engineer - Federal Team

Python
Node.js
Java
Kubernetes
  • U.S. Citizenship required
  • Master's Degree in Engineering or bachelor's degree with 7+ years of software engineering experience
  • 10+ years professional experience in Monitoring and Alerting roles on major cloud platforms
  • 4+ years experience in Cloud development and observability skills
  • Experience with building resilient platforms in AWS cloud environments
  • 3+ years of software development with Python, NodeJS, or Java
  • Hands-on experience with container orchestration and Kubernetes
  • Experience with logging and monitoring tools (Prometheus, Grafana, Datadog, AWS Cloudwatch)
  • Experience implementing advanced observability practices at scale