Braze, a leading customer engagement platform, is seeking a Senior Site Reliability Engineer II with a focus on Kafka to join their team. This role combines software engineering and systems administration to ensure site reliability and infrastructure scalability. The position operates at impressive scale, handling over 3.3 billion monthly active users and processing hundreds of billions of data points monthly.
The role demands expertise in Kafka performance tuning, monitoring, and automation, with responsibilities spanning architecture design, debugging, and incident management. You'll work with a technology stack including Ruby on Rails, MongoDB, Redis, Kafka, and Kubernetes, creating infrastructure as code and developing deployment pipelines.
As an SRE at Braze, you'll be instrumental in maintaining high availability and meeting enterprise-grade SLAs. The position requires strong collaboration skills, as you'll work with engineering teams to architect scalable solutions and improve infrastructure reliability. You'll also participate in on-call rotations and contribute to incident prevention and resolution.
The ideal candidate brings 5+ years of SRE/DevOps experience, with specific expertise in Kafka streaming applications and performance tuning. You should be passionate about automation, have strong programming skills (particularly in Ruby/Go), and possess deep knowledge of Linux systems.
Braze offers an exceptional work environment with comprehensive benefits, including equity participation, flexible PTO, and extensive professional development opportunities. The company is recognized as a Great Place to Work® across multiple regions and consistently ranks among the best technology workplaces. This role offers the opportunity to make a significant impact while working with a passionate, collaborative team in a remote setting.