Taro Logo

Senior Software Engineer, SRE, Cloud Incident Response

Google is a global technology company that builds innovative products and services used by billions of users.
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud

Description For Senior Software Engineer, SRE, Cloud Incident Response

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Senior SRE focusing on Cloud Incident Response, you'll be responsible for ensuring Google Cloud Platform's stability and reliability through critical incident support and continuous improvement. You'll work on building systems and tooling to improve visibility into Cloud state, detection of large-scale issues, and communications with stakeholders. The role requires expertise in distributed systems, incident management, and technical leadership.

The position is part of Google's Technical Infrastructure team, which builds and maintains the architecture behind Google's product portfolio. You'll be working with a team that takes pride in being the engineers' engineers, focusing on keeping networks running optimally and ensuring the best possible user experience.

The ideal candidate will bring strong experience in software development, distributed systems, and incident management, combined with excellent problem-solving and communication skills. You'll have the opportunity to work on unique challenges of scale specific to Google Cloud while contributing to the development of processes and tools that enhance platform reliability.

This role offers the chance to work with Google's SRE culture of intellectual curiosity and problem-solving, in an environment that encourages collaboration and big thinking. You'll be part of an organization that brings together diverse perspectives and backgrounds, promoting self-direction while providing support and mentorship for growth and learning.

Last updated 18 hours ago

Responsibilities For Senior Software Engineer, SRE, Cloud Incident Response

  • Ensure Google Cloud Platform (GCP) stability and reliability through critical incident support
  • Create training, end-to-end processes for incident management life-cycle
  • Build systems and tooling to support Incident Response team
  • Define and escalate risks in Cloud, reduce Major incident probabilities
  • Ensure the scalability and reliability of systems throughout their life-cycle

Requirements For Senior Software Engineer, SRE, Cloud Incident Response

Linux
Kubernetes
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 5 years of experience with software development in one or more programming languages
  • 5 years of experience with data structures or algorithms
  • 3 years of experience in designing, analyzing, and troubleshooting distributed systems
  • 2 years of experience leading projects and providing technical leadership
  • Experience in SRE or incident management/response environments

Interested in this job?

Jobs Related To Google Senior Software Engineer, SRE, Cloud Incident Response