Google Cloud's Site Reliability Engineering (SRE) team is seeking a Senior Software Engineer to join their mission of building and maintaining large-scale, distributed systems. This role combines software and systems engineering to ensure Google Cloud's services maintain optimal reliability and performance.
As an SRE, you'll tackle complex challenges unique to Google's scale, applying your expertise in coding, algorithms, and system design. You'll be responsible for the entire service lifecycle - from initial design through deployment and ongoing operation. The role involves building infrastructure, automating processes, and optimizing existing systems to eliminate manual work.
The position sits within Google's Technical Infrastructure organization, which is fundamental to keeping Google's vast product portfolio running smoothly. You'll work with cutting-edge technology and collaborate with talented engineers who take pride in building robust, scalable systems.
The team culture emphasizes diversity, intellectual curiosity, and blameless problem-solving. You'll have opportunities to work on meaningful projects with significant impact while receiving support and mentorship to grow your skills. The role offers a unique blend of software development and systems engineering, perfect for those passionate about building reliable, scalable infrastructure.
Key responsibilities include system design consulting, capacity planning, launch reviews, and maintaining service health through monitoring and automation. You'll also participate in incident response and help drive continuous improvement through postmortem analysis.
This is an excellent opportunity for an experienced engineer looking to work at scale, solve complex distributed systems challenges, and make a direct impact on Google Cloud's infrastructure reliability.