Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an Engineering Manager II for AdsML SRE, you will lead a team responsible for ensuring the reliability and uptime of Google's services, both internal and external. You'll manage complex challenges of scale unique to Google, using your expertise in coding, algorithms, complexity analysis, and large-scale system design.
The role requires a strong background in software development, data structures, and algorithms, with a focus on distributed systems. You'll be responsible for leading projects, mentoring team members, and driving technical excellence. Key responsibilities include managing service availability and performance, automating problem prevention and response, and continuously improving Google's infrastructure.
SRE at Google values diversity, intellectual curiosity, and a blame-free environment for problem-solving. You'll work with a team of diverse backgrounds and perspectives, collaborating on meaningful projects while receiving support and mentorship for your own growth.
This position is part of the Technical Infrastructure team, which is crucial in developing and maintaining Google's data centers and platforms. Your work will directly impact the user experience across Google's product portfolio, ensuring fast and reliable service.
Join Google's SRE team to tackle exciting challenges in large-scale distributed systems, contribute to cutting-edge technology, and lead a team of talented engineers in shaping the future of Google's infrastructure.