Google Cloud's Site Reliability Engineering (SRE) team is seeking a Staff Software Engineer to help build and maintain large-scale, distributed systems. This role combines software and systems engineering to ensure Google Cloud's services maintain optimal reliability and performance. As an SRE, you'll tackle complex challenges unique to Google's scale, focusing on system optimization, infrastructure development, and automation. The position requires expertise in coding, algorithms, and distributed systems design.
The role involves working with Google's Technical Infrastructure team, which is fundamental to Google's product portfolio. You'll be part of a team that values intellectual curiosity, problem-solving, and collaboration in a blame-free environment. The position offers opportunities to work on meaningful projects while receiving support and mentorship for professional growth.
Key responsibilities include managing service lifecycles from design to deployment, conducting system design consulting, capacity planning, and launch reviews. You'll monitor system health metrics, implement automation for scalability, and participate in incident response. The role requires strong technical skills and leadership experience, offering a competitive compensation package including base salary, bonus, equity, and comprehensive benefits.
This is an excellent opportunity for experienced engineers passionate about building reliable, scalable systems and who want to make a significant impact on Google Cloud's infrastructure. The role combines technical expertise with leadership opportunities, making it ideal for those looking to advance their careers in site reliability engineering at one of tech's leading companies.