Senior Site Reliability Engineer II (Kafka)

Leading customer engagement platform that empowers brands to be absolutely engaging, helping companies build and maintain engaging relationships with customers that foster growth and loyalty.
Ontario, Canada
DevOps
Senior Software Engineer
Remote
1,000 - 5,000 Employees
5+ years of experience
Enterprise SaaS

Description For Senior Site Reliability Engineer II (Kafka)

Braze is seeking a Senior Site Reliability Engineer II with a focus on Kafka to join their dynamic team. This role combines software engineering and systems administration to ensure the reliability and scalability of Braze's massive infrastructure, which serves over 3.3 billion monthly active users and processes hundreds of billions of data points monthly.

The position requires deep expertise in Kafka and distributed systems, with responsibilities spanning from performance tuning and automation to incident management and infrastructure development. You'll work with a technology stack including Ruby on Rails, MongoDB, Redis, Kafka, and Kubernetes, creating robust infrastructure solutions and maintaining enterprise-grade SLAs.

As an SRE at Braze, you'll collaborate with engineering teams to architect scalable solutions, develop infrastructure as code, and create deployment pipelines. The role involves being part of an on-call rotation and contributing to a culture of continuous improvement through incident retrospectives and automation initiatives.

The ideal candidate brings 5+ years of SRE/DevOps experience, with specific expertise in Kafka performance tuning and streaming applications. You should be passionate about solving complex systems challenges, have strong programming skills (particularly in Ruby or Go), and thrive in a collaborative, fast-paced environment.

Braze offers an exceptional work environment with comprehensive benefits, including equity compensation, flexible PTO, and extensive professional development opportunities. The company is recognized as a Great Place to Work® across multiple regions and consistently ranks among the best technology workplaces. This role offers the opportunity to make a significant impact at a rapidly growing, global customer engagement platform while working with cutting-edge technologies and a passionate team.

Last updated 4 days ago

Responsibilities For Senior Site Reliability Engineer II (Kafka)

  • Partner with engineering teams on architecting scalable and reliable products
  • Debug reliability and scalability issues across all stack layers
  • Implement monitoring and alerting systems
  • Ensure strict enterprise-grade SLAs are met
  • Create Infrastructure as code using Chef, Terraform, and Kubernetes
  • Develop deployment pipelines using Docker and Kubernetes
  • Manage incidents and be on PagerDuty rotation
  • Create system improvements and automation

Requirements For Senior Site Reliability Engineer II (Kafka)

Kafka
Kubernetes
MongoDB
Redis
Ruby
  • 5+ years of experience as a Software, DevOps, or Site Reliability Engineer
  • 3+ years of Data Streaming Reliability Engineering
  • 3+ years of Kafka performance tuning & automation
  • Strong background in scaling Kafka clusters
  • Experience with Docker, Kubernetes, Terraform
  • Experience with MongoDB, Redis, Kafka, Postgres
  • Strong programming skills - Ruby and/or Go preferred
  • Linux and Unix Shell expertise
  • Excellent ability to manage multiple tasks

Benefits For Senior Site Reliability Engineer II (Kafka)

401k
Dental Insurance
Education Budget
Equity
Medical Insurance
Mental Health Assistance
Parental Leave
Vision Insurance
  • Competitive compensation with equity
  • Retirement and Employee Stock Purchase Plans
  • Flexible paid time off
  • Comprehensive medical, dental, vision, life, and disability benefits
  • Family services including fertility benefits and parental leave
  • Professional development and yearly learning stipend
  • In-office employee experience
  • Volunteer opportunities and donation matching
  • Employee Resource Groups

Interested in this job?

Jobs Related To Braze Senior Site Reliability Engineer II (Kafka)

Senior Site Reliability Engineer II (Kafka)

Senior Site Reliability Engineer role focused on Kafka at Braze, managing and scaling distributed systems and data streaming infrastructure.

Senior Site Reliability Engineer II (Kafka)

Senior Site Reliability Engineer II position at Braze, focusing on Kafka infrastructure and streaming applications, offering remote work and comprehensive benefits.

Senior DevOps Infrastructure Engineer, Open-Source CI and CD

Senior DevOps Infrastructure Engineer position at NVIDIA, focusing on managing GPU-enabled GitHub Actions runners using Kubernetes and modern DevOps tools, offering remote work and competitive compensation.

DevOps Engineer (Azure)

Senior DevOps Engineer position at Velotio Technologies focusing on Azure cloud infrastructure, automation, and DevOps practices.

Senior Software Engineer, Developer Experience (DX) - Provo

Senior Software Engineer position at Qualtrics focusing on Developer Experience (DX) and CI/CD infrastructure, based in Provo, UT. Build and maintain scalable development tools and workflows for thousands of engineers.