Senior Software Engineer, SRE, Cloud Incident Response

Google

Google is a global technology company that builds innovative products and services used by billions of users.

London, UK

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

Enterprise SaaS · Cloud

Description For Senior Software Engineer, SRE, Cloud Incident Response

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Senior SRE focusing on Cloud Incident Response, you'll be responsible for ensuring Google Cloud Platform's stability and reliability through critical incident support and continuous improvement. You'll work on building systems and tooling to improve visibility into Cloud state, detection of large-scale issues, and communications with stakeholders. The role requires expertise in distributed systems, incident management, and technical leadership.

The position is part of Google's Technical Infrastructure team, which builds and maintains the architecture behind Google's product portfolio. You'll be working with a team that takes pride in being the engineers' engineers, focusing on keeping networks running optimally and ensuring the best possible user experience.

The ideal candidate will bring strong experience in software development, distributed systems, and incident management, combined with excellent problem-solving and communication skills. You'll have the opportunity to work on unique challenges of scale specific to Google Cloud while contributing to the development of processes and tools that enhance platform reliability.

This role offers the chance to work with Google's SRE culture of intellectual curiosity and problem-solving, in an environment that encourages collaboration and big thinking. You'll be part of an organization that brings together diverse perspectives and backgrounds, promoting self-direction while providing support and mentorship for growth and learning.

Last updated 18 hours ago

Responsibilities For Senior Software Engineer, SRE, Cloud Incident Response

Ensure Google Cloud Platform (GCP) stability and reliability through critical incident support
Create training, end-to-end processes for incident management life-cycle
Build systems and tooling to support Incident Response team
Define and escalate risks in Cloud, reduce Major incident probabilities
Ensure the scalability and reliability of systems throughout their life-cycle

Requirements For Senior Software Engineer, SRE, Cloud Incident Response

Linux

Kubernetes

Bachelor's degree in Computer Science, a related field, or equivalent practical experience
5 years of experience with software development in one or more programming languages
5 years of experience with data structures or algorithms
3 years of experience in designing, analyzing, and troubleshooting distributed systems
2 years of experience leading projects and providing technical leadership
Experience in SRE or incident management/response environments

Google

Google is a global technology company that builds innovative products and services used by billions of users.

London, UK

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

Enterprise SaaS · Cloud

Interested in this job?

Senior Software Engineer, SRE, Cloud Incident Response

Google

Description For Senior Software Engineer, SRE, Cloud Incident Response

Responsibilities For Senior Software Engineer, SRE, Cloud Incident Response

Requirements For Senior Software Engineer, SRE, Cloud Incident Response

Google

Jobs Related To Google Senior Software Engineer, SRE, Cloud Incident Response