Taro Logo

Staff Software Engineer, Site Reliability Engineering, Google Cloud

Google is a global technology company that builds and maintains large-scale distributed systems and infrastructure powering their product portfolio.
Site Reliability
Staff Software Engineer
Hybrid
5,000+ Employees
8+ years of experience
Enterprise SaaS · Cloud
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Staff Software Engineer, Site Reliability Engineering, Google Cloud

Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Staff SRE, you'll be responsible for ensuring Google Cloud's services maintain reliability and appropriate uptime while continuously improving performance. The role involves managing complex challenges of scale unique to Google Cloud, utilizing expertise in coding, algorithms, complexity analysis, and large-scale system design.

The position is part of Google's Technical Infrastructure team, which is fundamental in developing and maintaining data centers and building next-generation Google platforms. The team takes pride in being the engineers' engineers, focusing on keeping networks running optimally for the best user experience.

SRE's culture emphasizes diversity, intellectual curiosity, problem-solving, and openness. The organization brings together people with diverse backgrounds and perspectives, encouraging collaboration and risk-taking in a blame-free environment. Google promotes self-direction on meaningful projects while providing support and mentorship for continuous learning and growth.

The role offers the opportunity to work with cutting-edge technology, contribute to critical infrastructure, and impact billions of users worldwide. You'll be part of a team that values innovation, technical excellence, and sustainable system design, while maintaining Google's high standards for reliability and performance.

Last updated 2 months ago

Responsibilities For Staff Software Engineer, Site Reliability Engineering, Google Cloud

  • Engage in and improve the whole lifecycle of services from inception and design, through to deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Practice sustainable incident response and blameless postmortems

Requirements For Staff Software Engineer, Site Reliability Engineering, Google Cloud

Linux
Kubernetes
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 8 years of experience with data structures or algorithms
  • 5 years of experience with software development in one or more programming languages
  • 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems
  • Experience working in computing, distributed systems, storage, or networking
  • Ability to debug, optimize code, and to automate routine tasks
  • Systematic problem-solving approach, coupled with effective verbal and written communication skills
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems

Interested in this job?