Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure that Google Cloud's services have reliability and uptime appropriate to customer needs, while maintaining a fast rate of improvement. Your role will involve managing complex challenges of scale unique to Google Cloud, using expertise in coding, algorithms, complexity analysis, and large-scale system design.
The job involves writing product or system development code, reviewing code from other engineers, contributing to documentation, troubleshooting issues, and participating in design reviews. You'll work on optimizing existing systems, building infrastructure, and automating processes.
Google's SRE team values diversity, intellectual curiosity, problem-solving, and openness. The company encourages collaboration, big thinking, and risk-taking in a blame-free environment. You'll have the opportunity to work on meaningful projects with the support and mentorship needed to learn and grow.
Key responsibilities include managing project priorities and deliverables, designing and developing software solutions, and maintaining and enhancing these solutions. The role requires a bachelor's degree in Computer Science or related field (or equivalent experience) and at least 2 years of experience with data structures, algorithms, and software development.
Preferred qualifications include experience with distributed systems, storage, or networking, expertise in designing and troubleshooting large-scale systems, and strong problem-solving and communication skills. Join Google's SRE team to tackle exciting challenges in cloud computing and contribute to cutting-edge technology that powers businesses worldwide.