Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Senior Site Reliability Engineer for Cloud Spanner at Google, you'll ensure that Google's services have reliability and uptime appropriate to users' needs, while maintaining a fast rate of improvement. You'll be responsible for managing complex challenges of scale unique to Google, optimizing existing systems, building infrastructure, and eliminating work through automation.
The role requires expertise in coding, algorithms, complexity analysis, and large-scale system design. You'll work on meaningful projects in a blame-free environment that encourages collaboration, intellectual curiosity, and risk-taking. The SRE team at Google brings together people with diverse backgrounds and perspectives, promoting self-direction while providing support and mentorship for learning and growth.
Key responsibilities include planning and executing projects to improve reliability and efficiency, as well as collaborating with Software Engineering and other partner teams to evolve and expand Cloud Spanner capabilities. You'll be part of the Technical Infrastructure team, which is crucial in developing and maintaining Google's data centers and building the next generation of Google platforms.
This position offers the opportunity to work on cutting-edge technology, solve unique challenges at scale, and contribute to the reliability and performance of Google's critical systems. If you're passionate about distributed systems, enjoy troubleshooting complex issues, and want to make a significant impact on Google's infrastructure, this role is an excellent fit for you.