Taro Logo

Senior Software Engineer, Site Reliability Engineering

Google is a global technology company that builds innovative products and services used by billions of users.
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS

Job Description

Google's Site Reliability Engineering (SRE) team is at the forefront of ensuring the reliability and performance of Google Cloud's massive distributed systems. As a Senior SRE, you'll combine software and systems engineering expertise to build and maintain fault-tolerant systems at unprecedented scale. The role offers unique challenges in managing complex infrastructure while focusing on automation, system optimization, and maintaining high reliability standards.

The position sits within Google's Technical Infrastructure team, which is fundamental to keeping Google's vast product portfolio running smoothly. You'll be working on critical systems that directly impact billions of users worldwide, with opportunities to design, deploy, and refine large-scale distributed systems. The role requires a blend of deep technical expertise and leadership skills, as you'll be guiding projects and providing technical direction.

This is an exceptional opportunity for engineers passionate about distributed systems and infrastructure at scale. You'll work in a culture that values intellectual curiosity and problem-solving, with access to some of the most advanced technology infrastructure in the world. The role offers significant growth potential and the chance to work with talented engineers while solving complex technical challenges that few companies encounter.

Last updated 10 days ago

Responsibilities For Senior Software Engineer, Site Reliability Engineering

  • Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Practice sustainable incident response and blameless postmortems

Requirements For Senior Software Engineer, Site Reliability Engineering

Linux
Kubernetes
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 5 years of experience with software development in one or more programming languages
  • 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems
  • 2 years of experience leading projects and providing technical leadership
  • Experience working in computing, distributed systems, storage, or networking
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
  • Ability to debug, optimize code, and to automate routine tasks
  • Systematic problem-solving approach, coupled with effective verbal and written communication skills

Benefits For Senior Software Engineer, Site Reliability Engineering

Medical Insurance
401k
Parental Leave
  • Comprehensive health benefits
  • Retirement plans
  • Parental leave

Related Jobs

Senior Software Engineer, Site Reliability Engineering

Senior SRE position at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Senior Software Engineer, Site Reliability Engineering

Senior SRE position at Google focusing on building and maintaining large-scale distributed systems, combining software development with systems engineering expertise.

Senior Software Engineer, Site Reliability Engineering

Senior Site Reliability Engineering role at Google, combining software and systems engineering to build and maintain large-scale distributed systems with focus on reliability and performance.

Senior Specialist - Site Reliability Engineer III

Senior Site Reliability Engineer III position at On in Berlin, focusing on maintaining and improving system reliability and performance through SRE practices.

Senior Site Reliability Engineer — AI Studio (Inference Platform)

Senior SRE position at Nebius focusing on AI infrastructure, requiring expertise in Kubernetes, observability, and GPU optimization for large-scale inference platforms.