Taro Logo

Site Reliability Engineer, Platforms Infrastructure Engineering

A global technology company that specializes in internet-related services and products.
Site Reliability
Mid-Level Software Engineer
In-Person
5,000+ Employees
2+ years of experience
Enterprise SaaS

Job Description

Google's Site Reliability Engineering (SRE) team is seeking a skilled engineer to join their Platforms Infrastructure Engineering division. This role combines software and systems engineering to build and maintain Google Cloud's large-scale, distributed systems. As an SRE, you'll be responsible for ensuring the reliability and uptime of both internal and external systems while managing complex challenges unique to Google Cloud's scale.

The position offers an opportunity to work with cutting-edge technology and contribute to the platform's reliability and performance during a period of significant innovation. You'll be part of a team that values intellectual curiosity, problem-solving, and openness, working in a blame-free environment that encourages collaboration and risk-taking.

Your responsibilities will include developing platform-specific monitoring systems, characterizing new platforms, and driving reliability considerations into platform design. You'll work closely with development and SRE teams, leveraging your expertise in systems analysis, troubleshooting, and automation to optimize Google's compute infrastructure.

The role requires strong technical skills in programming languages like Python, Go, or Java, combined with deep systems knowledge and experience with Unix-based operating systems. You'll be part of Google's broader mission to build and maintain the infrastructure that powers their global services, making a direct impact on the reliability and performance of Google's platforms.

Last updated 6 days ago

Responsibilities For Site Reliability Engineer, Platforms Infrastructure Engineering

  • Drive an understanding of production reliability into platform design and development, through consulting, model development, and automation
  • Own the characterization and qualification of new platforms. Build reliability through understanding of the platform's performance and capabilities
  • Develop per-platform capability-focused Service Level Objective (SLO), monitoring, and alerts
  • Address the challenges created by the introduction of technologies into Google's production systems
  • Learn about the software and hardware that underpins Google's production systems and interact with the development and SRE teams

Requirements For Site Reliability Engineer, Platforms Infrastructure Engineering

Python
Go
Java
Linux
  • Bachelor's degree in Computer Science, or a related technical field, or equivalent practical experience
  • 2 years of experience in one or more of the following: C, C++, Java, Python, Go, Perl or Ruby
  • Experience analyzing and troubleshooting systems

Benefits For Site Reliability Engineer, Platforms Infrastructure Engineering

Medical Insurance
401k
Parental Leave
  • Comprehensive health benefits
  • Retirement plans
  • Parental leave support