Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Software Engineer III in SRE, you will ensure that Google Cloud's services have reliability, uptime appropriate to customer's needs, and a fast rate of improvement. You'll work on optimizing existing systems, building infrastructure, and eliminating work through automation.
The role offers unique challenges of scale specific to Google Cloud, allowing you to apply your expertise in coding, algorithms, complexity analysis, and large-scale system design. SRE's culture values diversity, intellectual curiosity, problem-solving, and openness. The organization brings together people with varied backgrounds and perspectives, encouraging collaboration and innovation in a blame-free environment.
You'll manage project priorities, deadlines, and deliverables while designing, developing, testing, deploying, maintaining, and enhancing software solutions. Key responsibilities include writing code, reviewing others' code, contributing to documentation, troubleshooting issues, and participating in design reviews.
This position requires a Bachelor's degree in Computer Science or related field (or equivalent experience) and at least 2 years of experience with data structures, algorithms, and software development. Preferred qualifications include experience with distributed systems, ability to debug and optimize code, and excellent problem-solving and communication skills.
Join Google's SRE team to tackle complex challenges, learn and grow with support and mentorship, and make a significant impact on large-scale systems that serve millions of users worldwide.