Google is seeking a Senior Software Engineer to join their Site Reliability Engineering (SRE) team, focusing on Cloud Incident Response. This role combines software and systems engineering to build and maintain large-scale, distributed systems for Google Cloud Platform. The position requires expertise in distributed systems, incident management, and software development.
As an SRE, you'll be responsible for ensuring the reliability and uptime of Google Cloud's services, both internal and customer-facing systems. The role involves optimizing existing systems, building infrastructure, and automating processes to improve efficiency and reliability. You'll work on complex challenges unique to Google Cloud's scale while applying your expertise in coding, algorithms, and system design.
The position offers the opportunity to work in a culture that values intellectual curiosity and problem-solving. You'll be part of an organization that brings together diverse perspectives and encourages collaboration in a blame-free environment. The role involves both independent work on meaningful projects and collaborative efforts with supportive mentorship.
Key responsibilities include maintaining GCP stability through incident support, developing incident management processes, building tooling for improved system visibility, and implementing proactive measures to reduce major incidents. You'll work closely with Cloud Support leadership and contribute to system design, capacity planning, and continuous improvement initiatives.
This is an ideal role for someone who combines strong technical skills with leadership ability, has a passion for system reliability, and wants to make a significant impact on Google's cloud infrastructure. The position offers the chance to work on cutting-edge technology while ensuring millions of users have a reliable and efficient cloud experience.