Microsoft's Azure Data engineering team is seeking a Principal Site Reliability Engineer to join their databases team, focusing on Azure Cosmos DB - a globally distributed, massively scalable, multi-model cloud database service. This role combines software engineering excellence with operational expertise to ensure high availability and performance of critical cloud infrastructure.
The position offers an opportunity to work with cutting-edge technology while maintaining stringent Service Level Objectives (SLOs) for one of Azure's fastest-growing services. You'll be responsible for building and optimizing solutions that analyze massive amounts of telemetry and service health indicators in near real-time, performing automated root cause analysis, and implementing necessary mitigations to maintain service reliability.
As a Principal SRE, you'll work at the intersection of development and operations, focusing on making on-call engineering more efficient through automation and proactive problem-solving. The role involves collaboration with both engineering teams and enterprise customers, requiring strong technical communication skills and a data-driven approach to problem-solving.
The position offers competitive compensation ($137,600 - $267,000 base salary range), comprehensive benefits, and the opportunity to work remotely. You'll be part of a team that values innovation, inclusion, and maintaining a growth mindset while building systems that power some of the largest companies in healthcare, retail, telecommunications, and IoT sectors.
This role is perfect for someone who combines deep technical expertise with a passion for service reliability, automated problem-solving, and customer success. You'll have the chance to influence product architecture and roadmap while ensuring supportability remains a key consideration in product evolution.
Join Microsoft's Azure Data team to help build the data platform for the age of AI, working with a talented team that operates with the agility of a startup while backed by the resources and stability of a global technology leader.