Taro Logo

Senior Software Engineer, SRE, Cloud Incident Response

Google is a global technology company that builds innovative products and services used by billions of users.
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud

Description For Senior Software Engineer, SRE, Cloud Incident Response

Google is seeking a Senior Software Engineer to join their Site Reliability Engineering (SRE) team, focusing on Cloud Incident Response. This role combines software and systems engineering to build and maintain large-scale, distributed systems that power Google Cloud's services. The position is critical in ensuring the reliability and uptime of both internal and customer-facing systems.

The role involves working with complex challenges unique to Google Cloud's scale, requiring expertise in coding, algorithms, complexity analysis, and large-scale system design. You'll be part of a culture that values intellectual curiosity, problem-solving, and openness, working in a blame-free environment that encourages collaboration and innovation.

As an SRE focusing on Cloud Incident Response, you'll be responsible for ensuring GCP's stability through critical incident support, developing training and processes for incident management, and building tools to improve cloud system visibility. You'll work on reducing major incident probabilities and ensuring system scalability throughout their lifecycle.

The position offers the flexibility of hybrid work arrangements across several European locations including London, Zürich, Dublin, and Warsaw, with remote work options available for UK-based candidates. You'll be joining Google's Technical Infrastructure team, which is fundamental to keeping Google's vast product portfolio running smoothly.

The ideal candidate should have strong experience in software development, distributed systems, and incident management, combined with excellent problem-solving and communication skills. You'll have the opportunity to work on meaningful projects while receiving support and mentorship to continue learning and growing in your career.

This role is perfect for someone who enjoys the challenges of large-scale systems, has a passion for reliability engineering, and wants to make a significant impact on the stability and performance of Google Cloud Platform's services. You'll be working with cutting-edge technology while collaborating with some of the best engineers in the industry.

Last updated a day ago

Responsibilities For Senior Software Engineer, SRE, Cloud Incident Response

  • Ensure Google Cloud Platform (GCP) stability and reliability through critical incident support
  • Create training, end-to-end processes for incident management life-cycle
  • Build systems and tooling to support the Incident Response team
  • Define and escalate risks in Cloud and reduce major incident probabilities
  • Ensure the scalability and reliability of systems throughout their life-cycle

Requirements For Senior Software Engineer, SRE, Cloud Incident Response

Kubernetes
Linux
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 5 years of experience with software development in one or more programming languages
  • 5 years of experience with data structures or algorithms
  • 3 years of experience in designing, analyzing, and troubleshooting distributed systems
  • 2 years of experience leading projects and providing technical leadership
  • Experience in SRE or incident management/response environments

Interested in this job?

Jobs Related To Google Senior Software Engineer, SRE, Cloud Incident Response