Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Software Engineering Manager in the SRE team, you'll lead a team responsible for ensuring Google's services maintain reliability and appropriate uptime while continuously improving performance and capacity. The role involves managing complex challenges unique to Google's scale, utilizing expertise in coding, algorithms, and large-scale system design.
The position is within Google's Technical Infrastructure team, which is fundamental to keeping Google's extensive product portfolio running smoothly. You'll be responsible for leading a team that maintains and develops data centers and next-generation Google platforms, ensuring optimal network performance and user experience.
The role requires strong technical leadership skills, with a focus on mentoring and growing engineering teams in a fast-paced environment. You'll oversee critical services' availability and performance, implement automation strategies, and manage global on-call rotations. The position combines technical expertise with people management, requiring both deep systems knowledge and the ability to inspire and develop team members.
SRE at Google promotes a culture of diversity, intellectual curiosity, and problem-solving in a blame-free environment. The team brings together individuals with varied backgrounds and perspectives, encouraging collaboration and innovative thinking. This role offers the opportunity to work on meaningful projects with significant impact while receiving support and mentorship for continuous learning and growth.