Taro Logo

Tech Lead, Senior Site Reliability Engineer

Google is a global technology company that builds innovative products and services used by billions of users.
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS

Job Description

Site Reliability Engineering (SRE) at Google is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Tech Lead Senior SRE, you'll be responsible for ensuring Google's services have appropriate reliability and uptime while maintaining performance and capacity optimization. The role involves creative engineering solutions to operations problems, with a focus on optimizing existing systems, building infrastructure, and automation.

The position sits within Google's Technical Infrastructure team, which is fundamental to keeping Google's vast product portfolio running. You'll be working on developing and maintaining data centers and building next-generation Google platforms. The team takes pride in being the engineers' engineers, focusing on keeping networks running optimally to ensure the best user experience.

The role requires a strong technical background with at least 5 years of software development experience and expertise in distributed systems. You'll lead projects and provide technical leadership, working with various tools and approaches to solve a broad spectrum of problems. The culture emphasizes intellectual curiosity, problem-solving, and openness, bringing together people with diverse backgrounds and perspectives.

Key aspects of the role include engaging in the complete service lifecycle, from design to deployment and refinement, supporting services pre-launch through system design consulting and capacity planning, and maintaining live services through monitoring and health checks. You'll be responsible for scaling systems through automation and driving changes that improve reliability and velocity.

This is an excellent opportunity for a seasoned engineer looking to take on technical leadership in a role that combines software engineering with systems operations at massive scale. The position offers the chance to work on some of the world's largest distributed systems while leading and mentoring other engineers in a blame-free, collaborative environment focused on continuous improvement.

Last updated 19 days ago

Responsibilities For Tech Lead, Senior Site Reliability Engineer

  • Engage in and improve the whole life-cycle of services from inception and design, through to deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Practice sustainable incident response and blameless postmortems

Requirements For Tech Lead, Senior Site Reliability Engineer

Go
Linux
Kubernetes
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 5 years of experience with software development in one or more programming languages
  • 5 years of experience with data structures or algorithms
  • 3 years of experience in designing, analyzing, and troubleshooting distributed systems
  • 2 years of experience leading projects and providing technical leadership
  • Experience with C++/Go
  • Ability to debug, optimize code, and to automate routine tasks
  • Excellent problem-solving approach, with effective verbal and written communication skills

Related Jobs

Tech Lead, Senior Site Reliability Engineer

Lead technical initiatives and provide engineering excellence in Google's Site Reliability Engineering team, ensuring reliability and performance of large-scale distributed systems.

Staff Software Engineer, Site Reliability Engineering

Staff SRE position at Google focusing on building and maintaining large-scale distributed systems, requiring 8+ years of software development experience and strong system design skills.

Tech Lead, Senior Site Reliability Engineer

Lead Site Reliability Engineering role at Google, focusing on designing and maintaining large-scale distributed systems with emphasis on reliability, scalability, and automation.

Tech Lead, Senior Site Reliability Engineer

Lead and provide technical direction for Google's Site Reliability Engineering team, building and maintaining large-scale distributed systems that power Google's services.

Tech Lead, Senior Site Reliability Engineer

Lead and architect large-scale distributed systems as a Tech Lead SRE at Google, ensuring reliability and performance of critical services while providing technical leadership.