Taro Logo

Staff Site Reliability Engineer

The leading independent provider of identity for the enterprise, enabling organizations to securely connect people to technology.
Site Reliability
Staff Software Engineer
Remote
5,000+ Employees
6+ years of experience
Enterprise SaaS · Cybersecurity

Description For Staff Site Reliability Engineer

Okta, The World's Identity Company, is seeking a Staff Site Reliability Engineer to join their Workforce Identity Cloud (WIC) team. As a critical member of Technical Operations, you'll embrace the "Always On" motto while building reliable and performant systems through automation. The role involves working with a critical SaaS platform used by millions of customers daily, managing complex containerized deployments, and driving significant replatforming initiatives.

You'll be instrumental in navigating the transition of critical components between container orchestration systems while ensuring zero downtime. The position requires deep technical expertise in cloud infrastructure, particularly AWS, along with strong programming skills in languages like Python, Rust, or Go. You'll work with cutting-edge technologies including Kubernetes and various cloud services, while being part of a global team supporting 24x7 operations.

The ideal candidate brings 6+ years of SRE experience, strong Linux fundamentals, and expertise in infrastructure as code. You'll have the opportunity to influence architectural decisions, mentor team members, and drive best practices across WIC engineering. Okta offers a dynamic work environment with the best tools and technology, along with comprehensive benefits and opportunities for social impact through Okta for Good.

This role combines technical leadership with hands-on engineering, requiring both depth in systems architecture and breadth across modern cloud technologies. You'll be joining a company at the forefront of identity and access management, serving over 19,300 organizations including major enterprises like JetBlue, Nordstrom, and T-Mobile. The position offers the chance to work on challenging technical problems at scale while contributing to a product that securely connects millions of users to their essential technologies.

Last updated 21 hours ago

Responsibilities For Staff Site Reliability Engineer

  • Become deeply familiar with all aspects of a critical SaaS platform used by millions of customers daily
  • Navigate replatforming initiative, moving critical components between container orchestration systems with zero downtime
  • Engage with stakeholders to understand component boundaries and dependencies
  • Drive SDLC improvements for microservices and features
  • Identify and automate manual processes
  • Support 24x7 online environment as part of global on-call rotation
  • Advocate best practices for scalable, reliable, and resilient systems

Requirements For Staff Site Reliability Engineer

Python
Go
Rust
Kubernetes
Linux
  • 6+ years of experience as a site reliability or platform engineer
  • Familiarity with large scale containerised microservice and monolithic deployments
  • Experience with AWS and cloud providers
  • Knowledge of CI/CD principles, Linux fundamentals, OS hardening
  • Strong skills in Python, Rust, or Go
  • Understanding of relational and non-relational datastores
  • 3+ years experience with Ansible, Chef, Terraform or other IaC tools
  • BS In computer science (or equivalent experience)

Benefits For Staff Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
  • Amazing Benefits Package
  • Making Social Impact
  • Talent Development and Community Building

Interested in this job?

Jobs Related To Okta Staff Site Reliability Engineer

Staff Site Reliability Engineer - Cyber Defence Engineering

Staff Site Reliability Engineer position at Okta focusing on cyber defense systems, requiring Python expertise and 10+ years of experience in SRE/software engineering.

Staff Site Reliability Engineer

Staff Site Reliability Engineer position at Fivetran, focusing on infrastructure reliability, monitoring, and optimization of cloud-based systems.

Staff Reliability Engineer

Staff Reliability Engineer position at The Hartford focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Staff Site Reliability Engineer - Cyber Defence Engineering

Staff Site Reliability Engineer position at Okta focusing on cyber defense systems, requiring Python expertise and 10+ years of experience in SRE/software engineering.

Lead Application Support Engineer (SRE)

Lead Application Support Engineer (SRE) position at Marriott Vacations Worldwide, focusing on application reliability and performance optimization using modern technologies and cloud platforms.