Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and maintain large-scale, distributed systems. This role focuses on ensuring Google's services maintain optimal reliability and performance while continuously improving. As a Systems Engineer III in SRE, you'll tackle complex scalability challenges unique to Google's infrastructure while applying expertise in coding, algorithms, and system design. The role involves both hands-on technical work and collaboration with global teams.
The position requires strong programming skills and deep knowledge of Linux/Unix systems and networking. You'll be responsible for improving reliability of critical enterprise applications, participating in on-call rotations, and driving technical improvements through automation and system evolution. The role offers opportunities to work with cutting-edge technology at massive scale while contributing to Google's technical infrastructure.
SRE at Google promotes a culture of intellectual curiosity and problem-solving in a blame-free environment. The team brings together diverse perspectives and backgrounds, encouraging collaboration and innovative thinking. You'll have the chance to work on meaningful projects with self-direction while receiving support and mentorship for professional growth.
The Technical Infrastructure team, which this role is part of, is fundamental to Google's operations, building and maintaining the architecture that powers Google's entire product portfolio. This includes developing data centers, creating next-generation platforms, and ensuring networks deliver the best possible user experience.