Taro Logo

Principal Site Reliability Engineer - Federal Team

Saviynt provides AI-powered identity platform that manages and governs human and non-human access to organizations' applications, data, and business processes.
Site Reliability
Principal Software Engineer
Hybrid
10+ years of experience
Enterprise SaaS · Cybersecurity

Job Description

Saviynt, a leader in identity security, is seeking a Principal Site Reliability Engineer for their Federal Team. This role sits within Saviynt Labs, the organization responsible for designing, building, and running their enterprise identity solutions. The position combines deep technical expertise in cloud platforms (AWS, GCP, Azure) with a focus on maintaining and improving system reliability and performance.

The ideal candidate will be responsible for implementing comprehensive monitoring and alerting systems, ensuring high availability, and maintaining performance standards. They will work with cutting-edge technologies including Kubernetes, cloud platforms, and modern observability tools like Prometheus and Grafana. The role requires a strong background in software development, particularly with Python, NodeJS, or Java, combined with extensive experience in cloud operations and monitoring.

This is an opportunity to join a high-growth Platform as a Service company that's at the forefront of identity security. The position offers significant technical challenges, working with distributed systems at scale, and the chance to directly impact customer success. The role requires U.S. citizenship and involves working with federal clients, adding an extra layer of responsibility and security requirements.

Working at Saviynt means joining a company that's focused on innovation and engineering excellence, with their solutions trusted by Fortune 500 companies and government institutions. The company offers a welcoming and positive work environment, with opportunities for tremendous growth and learning through challenging yet rewarding work. The hybrid work environment provides flexibility while maintaining collaborative opportunities with cross-functional teams.

Last updated a day ago

Responsibilities For Principal Site Reliability Engineer - Federal Team

  • Implement monitoring and alerting systems to guarantee high availability and performance
  • Collaborate with engineering and operations teams to identify critical components
  • Design and implement strategies for system uptime and reliability
  • Run the production environment by monitoring availability
  • Build software and systems to monitor platform infrastructure
  • Monitor and improve reliability, quality, and time-to-market
  • Provide operational support for large-scale distributed software applications
  • Gather and analyze metrics for performance tuning and fault finding

Requirements For Principal Site Reliability Engineer - Federal Team

Python
Node.js
Java
Kubernetes
  • U.S. Citizenship required
  • Master's Degree in Engineering or bachelor's degree with 7+ years of software engineering experience
  • 10+ years professional experience in Monitoring and Alerting roles on major cloud platforms
  • 4+ years experience in Cloud development and observability skills
  • Experience with building resilient platforms in AWS cloud environments
  • 3+ years of software development with Python, NodeJS, or Java
  • Hands-on experience with container orchestration and Kubernetes
  • Experience with logging and monitoring tools (Prometheus, Grafana, Datadog, AWS Cloudwatch)
  • Experience implementing advanced observability practices at scale