Google's Site Reliability Engineering (SRE) team is seeking a skilled engineer to join their Platforms Infrastructure Engineering division. This role combines software and systems engineering to build and maintain Google Cloud's large-scale, distributed systems. As an SRE, you'll be responsible for ensuring the reliability and uptime of both internal and external systems while managing complex challenges unique to Google Cloud's scale.
The position offers an opportunity to work with cutting-edge technology and contribute to the platform's reliability and performance during a period of significant innovation. You'll be part of a team that values intellectual curiosity, problem-solving, and openness, working in a blame-free environment that encourages collaboration and risk-taking.
Your responsibilities will include developing platform-specific monitoring systems, characterizing new platforms, and driving reliability considerations into platform design. You'll work closely with development and SRE teams, leveraging your expertise in systems analysis, troubleshooting, and automation to optimize Google's compute infrastructure.
The role requires strong technical skills in programming languages like Python, Go, or Java, combined with deep systems knowledge and experience with Unix-based operating systems. You'll be part of Google's broader mission to build and maintain the infrastructure that powers their global services, making a direct impact on the reliability and performance of Google's platforms.