Borg Lifecycle Site Reliability Engineer

Google

Google is a global technology leader specializing in internet-related services and products.

Warsaw, Poland

Site Reliability

Mid-Level Software Engineer

In-Person

5,000+ Employees

5+ years of experience

Enterprise SaaS · AI

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Borg Lifecycle Site Reliability Engineer

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Borg Lifecycle SRE, you'll be responsible for ensuring Google Cloud's services maintain reliability and appropriate uptime while managing complex challenges of scale unique to Google Cloud. The role involves optimizing existing systems, building infrastructure, and automating processes.

The position requires expertise in coding, algorithms, complexity analysis, and large-scale system design. You'll work specifically with the Borg infrastructure, which is crucial to Google's operations, handling diverse challenges across global infrastructure and working on high-impact projects that drive innovation.

The SRE team at Google embraces a culture of diversity, intellectual curiosity, and problem-solving in a blame-free environment. You'll collaborate with people from various backgrounds and perspectives, working on meaningful projects while receiving support and mentorship for professional growth. The role combines technical leadership, hands-on engineering, and production support, making it ideal for those interested in large-scale distributed systems and infrastructure management.

Key aspects include managing Borg lifecycle phases, supporting different cell flavors, and participating in on-call rotations to ensure 24/7 reliability. You'll work closely with development teams and other SREs to design and implement scalable, reliable, and secure solutions that support various Google initiatives. This role offers an opportunity to impact Google's infrastructure at a global scale while working with cutting-edge technology and talented engineers.

Last updated 8 months ago

Responsibilities For Borg Lifecycle Site Reliability Engineer

Drive the technical direction for the Borg Lifecycle SRE team
Provide ongoing engineering and production support for Borg lifecycle phases (turnup, turndown, cell management) and support different Borg cell flavors
Work with partner development and SRE teams to design and deliver different programs and projects in a scalable, reliable, and secure manner
Design and develop innovative solutions that enable key Google initiatives that scale with the requirements of the business
Be a full member of Borg SRE on-call rotation(s). Support the Borg ecosystem at global scale and ensure production keeps running for our users

Requirements For Borg Lifecycle Site Reliability Engineer

Python

Java

Linux

5 years of experience with performance, system architecture, systems data analysis, visualization tools, debugging
Coding and scripting experience in one or more languages (Python, Perl, C, C++ or Java)
Master's degree or PhD in Engineering, Computer Science, or a related technical field (preferred)
5 years of experience with UNIX/Linux (preferred)
Experience with cloud solutions: Open source software communities, Cloud networking solutions, distributed-computing technology, Hybrid/Multi Cloud connectivity (preferred)