Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. This role focuses on ensuring Google's services maintain reliability and appropriate uptime while continuously improving performance and capacity. As an SRE III, you'll work on optimizing existing systems, building infrastructure, and automating processes. The position requires expertise in coding, algorithms, complexity analysis, and large-scale system design.
The role offers the opportunity to work on unique scaling challenges specific to Google's infrastructure. You'll be part of a culture that values intellectual curiosity, problem-solving, and openness, bringing together diverse perspectives in a blame-free environment. The team promotes self-direction while providing support and mentorship for growth and learning.
Your responsibilities will include managing project priorities, designing and developing software solutions, and maintaining critical infrastructure. You'll work with the Technical Infrastructure team to support Google's entire product portfolio, ensuring optimal performance and user experience. The position offers competitive compensation including base salary, bonus, equity, and comprehensive benefits.
This is an excellent opportunity for engineers passionate about large-scale systems, automation, and maintaining high-reliability services. You'll collaborate with talented peers, work on meaningful projects, and contribute to systems that impact billions of users globally.