Braze is seeking a Senior Site Reliability Engineer II with a focus on Kafka to join their dynamic team. This role combines software engineering and systems administration to ensure the reliability and scalability of Braze's massive infrastructure, which serves over 3.3 billion monthly active users and processes hundreds of billions of data points monthly.
The position requires deep expertise in Kafka and distributed systems, with responsibilities spanning from performance tuning and automation to incident management and infrastructure development. You'll work with a technology stack including Ruby on Rails, MongoDB, Redis, Kafka, and Kubernetes, creating robust infrastructure solutions and maintaining enterprise-grade SLAs.
As an SRE at Braze, you'll collaborate with engineering teams to architect scalable solutions, develop infrastructure as code, and create deployment pipelines. The role involves being part of an on-call rotation and contributing to a culture of continuous improvement through incident retrospectives and automation initiatives.
The ideal candidate brings 5+ years of SRE/DevOps experience, with specific expertise in Kafka performance tuning and streaming applications. You should be passionate about solving complex systems challenges, have strong programming skills (particularly in Ruby or Go), and thrive in a collaborative, fast-paced environment.
Braze offers an exceptional work environment with comprehensive benefits, including equity compensation, flexible PTO, and extensive professional development opportunities. The company is recognized as a Great Place to Work® across multiple regions and consistently ranks among the best technology workplaces. This role offers the opportunity to make a significant impact at a rapidly growing, global customer engagement platform while working with cutting-edge technologies and a passionate team.