Sr. Site Reliability Engineer

authzed

Open Source, Google Zanzibar-inspired permissions database

New York, NY, USA

$150,000 - $195,000

Site Reliability

Senior Software Engineer

Remote

11 - 50 Employees

3+ years of experience

Enterprise SaaS · Cybersecurity

This job posting may no longer be active. You may be interested in these related jobs instead:

Site Reliability Engineer

Adobe

Senior Site Reliability Engineer role at Adobe focusing on production environment management, automation, and system reliability, requiring 6+ years of DevOps/SRE experience.

Site Reliability Engineer

Oracle

Senior Site Reliability Engineer role at Oracle Cloud Infrastructure, focusing on cloud operations for government and classified systems, requiring security clearance and strong technical expertise.

Senior Software Engineer, Site Reliability Tooling

Upstart

Senior Site Reliability Engineer role at Upstart, focusing on tooling and automation for production systems reliability, offering competitive compensation and remote/hybrid work options.

Senior Software Engineer - Incident Management

Datadog

Senior Software Engineer position at Datadog focusing on incident management, on-call experience improvement, and building resilient systems using Go, Python, and TypeScript.

Senior Software Engineer - Site Reliability Engineering

Roblox

Senior SRE position at Roblox focusing on building reliable, scalable systems and automation tools, offering $238k-$289k salary with comprehensive benefits in San Mateo, CA.

Description For Sr. Site Reliability Engineer

We are seeking a Site Reliability Engineer to join our pioneering open-source authorization solutions company. As an SRE, you'll be crucial in ensuring system reliability, availability, and performance for SpiceDB - our open-source permissions database inspired by Google's Zanzibar system.

The role involves designing and implementing scalable infrastructure solutions, monitoring system performance, and automating deployment processes. You'll work with cutting-edge technologies including containerization (Docker, Kubernetes), infrastructure-as-code tools (Terraform, Pulumi), and various programming languages (NodeJS, Java, Python, Ruby, Go).

AuthZed is a fully remote company with a $12M Series A funding, focusing on building managed services for planet-scale production authorization services. We offer a software-driven culture where every team member, including sales, understands and loves our technology.

Key responsibilities include:

Designing and maintaining highly available infrastructure solutions
Monitoring and optimizing system performance
Automating infrastructure deployment
Implementing security measures and best practices
Participating in on-call rotation
Collaborating with engineering teams

We value agency, collaboration, and open-mindedness. Our team of 24 works across the US and Europe, bringing integrity to all interactions and fostering confidence in decision-making. This is an excellent opportunity to join a growing startup that's becoming the open-source standard in authorization database technology.

The ideal candidate will have strong experience with Site Reliability Engineering, System Design, and Distributed Computing, along with excellent problem-solving and communication skills. If you're passionate about infrastructure and authorization systems, and want to work with a hardworking group that respects every team member's voice, we'd love to have you join our team.

Last updated 3 months ago

Responsibilities For Sr. Site Reliability Engineer

Design, implement, and maintain highly available and scalable infrastructure solutions
Monitor and analyze system performance
Automate infrastructure deployment and configuration management
Improve system reliability, security, and efficiency
Troubleshoot and resolve complex infrastructure issues
Collaborate with software engineering teams
Participate in on-call rotation
Document system configurations and procedures

Requirements For Sr. Site Reliability Engineer

Java

JavaScript

Python

Ruby

Node.js

Kubernetes

Proven experience as a Site Reliability Engineer
Strong understanding of networking, operating systems, and cloud infrastructure
Experience with Site Reliability Engineering, System Design, and Distributed Computing
Experience in multiple programming languages (NodeJS, Java, Python, Ruby, Go)
Experience with Docker and Kubernetes
Knowledge of infrastructure-as-code tools (Terraform, Pulumi)
Familiarity with monitoring tools (Prometheus, Grafana, ELK stack)
Experience with relational databases
Experience with Git and GitHub
Experience with CI/CD systems
Strong problem-solving skills
Excellent communication abilities