Google's Site Reliability Engineering (SRE) team is seeking a Senior Software Engineer to join their Cloud Incident Response team. This role combines software and systems engineering to build and maintain large-scale, distributed systems for Google Cloud Platform. The position focuses on ensuring service reliability, managing incidents, and driving continuous improvement through automation.
As an SRE, you'll tackle complex challenges unique to Google Cloud's scale, applying expertise in coding, algorithms, and system design. The role involves critical incident support, building tooling for incident response, and implementing processes to improve system reliability. You'll work in a culture that values intellectual curiosity and problem-solving, collaborating with diverse teams across Google's Technical Infrastructure organization.
The ideal candidate brings strong experience in distributed systems, incident management, and technical leadership. You'll be responsible for maintaining system stability, developing automation tools, and driving improvements in incident response processes. This is an opportunity to work on mission-critical systems that power Google's vast product portfolio while contributing to the evolution of cloud infrastructure.
Working at Google offers exposure to cutting-edge technology, collaboration with world-class engineers, and the chance to impact billions of users. The role provides opportunities for growth, learning, and technical leadership in a supportive environment that promotes self-direction and innovation. Join Google's SRE team to help build and maintain the future of cloud computing while solving some of the most interesting technical challenges in the industry.