Google Cloud is seeking a Senior Software Developer for Site Reliability Development to join their team in Waterloo. This role combines software and systems development to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE Developer, you'll ensure Google's services maintain reliability and appropriate uptime while focusing on system optimization, infrastructure development, and automation. You'll tackle unique scaling challenges specific to Google's infrastructure while applying expertise in coding, algorithms, and large-scale system design.
The role involves managing the complete lifecycle of services from design to deployment and refinement. You'll work on system design consulting, develop software platforms and frameworks, conduct capacity planning, and perform launch reviews. Post-deployment responsibilities include monitoring system health, implementing automation for scale, and handling incident response.
The Technical Infrastructure team builds and maintains the architecture supporting Google's entire product portfolio. From developing data centers to creating next-generation platforms, the team ensures optimal performance and reliability of Google's vast network infrastructure.
This position offers the opportunity to work with complex distributed systems at massive scale, collaborate with talented engineers, and contribute to Google's critical infrastructure. The team promotes intellectual curiosity, problem-solving, and openness while encouraging self-direction on meaningful projects. You'll have access to mentorship and support for continued learning and growth.
The ideal candidate combines strong software development skills with systems engineering expertise and brings experience leading technical projects. If you're passionate about reliability, scalability, and automation, and want to work on infrastructure powering billions of users, this role offers an exciting opportunity to make a significant impact.