Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and maintain large-scale distributed systems. This role is part of the Caching SRE team within Core Data Foundations, managing critical services that underpin Search, Ads, Gaea Identity, Workspace, and other vital systems. The position focuses on ensuring Google Cloud's services maintain appropriate reliability and uptime while continuously improving performance and capacity.
As an SRE, you'll tackle unique scaling challenges specific to Google Cloud, applying expertise in coding, algorithms, and system design. The role involves optimizing existing systems, building infrastructure, and implementing automation to eliminate manual work. The team manages critical services like Static Content Service, Laelaps, Punctual, and Memstore.
The position offers opportunities to work in a blame-free environment that encourages intellectual curiosity, collaboration, and risk-taking. Google promotes self-direction while providing support and mentorship for professional growth. The role combines technical challenges with the responsibility of maintaining services that impact billions of users.
The ideal candidate will have strong programming skills, experience with distributed systems, and a passion for solving complex technical problems. This role offers the chance to work with cutting-edge technology while contributing to the reliability of Google's global infrastructure.