Principal Site Reliability Engineer

Microsoft

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further in a cloud-enabled world.

Redmond, WA, USA

$139,900 - $274,800

Site Reliability

Principal Software Engineer

Remote

5,000+ Employees

6+ years of experience

Enterprise SaaS · Cloud

Description For Principal Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Principal Site Reliability Engineer to join their databases team, focusing on Azure Cosmos DB - a globally distributed, massively scalable, multi-model cloud database service. This role combines technical expertise with service reliability to maintain Microsoft's operational Database systems.

The position offers an opportunity to work with cutting-edge technology in a team that operates like a startup while being part of one of the world's largest tech companies. You'll be responsible for ensuring 99.99% availability and <10ms latency SLAs for critical systems used in Healthcare, Retail, Telecommunications, and IoT applications.

As a Principal SRE, you'll focus on automating root cause analysis and issue mitigation, often addressing problems before they impact customers. The role requires a data-driven approach to solving Service Reliability problems, analyzing massive amounts of telemetry, and implementing automated solutions to maintain service level objectives (SLOs).

The position offers competitive compensation ($139,900 - $274,800 base salary range, higher in SF and NYC areas) and comprehensive benefits including healthcare, educational resources, savings plans, and parental leave. You'll be part of Microsoft's inclusive culture that values diverse perspectives and collaborative problem-solving.

Key responsibilities include building automation solutions, collaborating with customers on supportability issues, implementing service telemetry, and providing operational insights to product teams. The ideal candidate will have 6+ years of technical engineering experience, strong coding skills, and extensive experience with large-scale cloud services.

This is an excellent opportunity for a seasoned SRE professional who wants to make a significant impact on one of Microsoft's fastest-growing Azure services while working with cutting-edge cloud technology and contributing to systems that serve millions of users worldwide.

Last updated 11 hours ago

Responsibilities For Principal Site Reliability Engineer

Collaborating with engineering teams on building and enhancing tooling and automation solutions
Working with customers to understand pain points around Supportability and SLO attainment
Communicate technically and interface with enterprise customers for service escalations
Design and implement service telemetry changes
Enhance customer facing experience through proactive alerting
Analyze data and provide operational insights to Design and Product teams

Requirements For Principal Site Reliability Engineer

Python

Java

JavaScript

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding
6+ years of experience running large scale cloud services
3+ years of operational experience in improving Service Reliability, Availability and Performance
Understanding of Observability and MELT implementation patterns for large-scale services
Experience in Logic Apps and authoring Jupyter Notebooks
Expertise in analyzing, troubleshooting, and automating root cause analysis
Systematic problem-solving approach with effective communication skills
Ability to deal with ambiguity in a fast-paced environment

Benefits For Principal Site Reliability Engineer

Medical Insurance

Parental Leave

Vision Insurance

Dental Insurance

401k

Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Microsoft

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further in a cloud-enabled world.

Redmond, WA, USA

$139,900 - $274,800

Site Reliability

Principal Software Engineer

Remote

5,000+ Employees

6+ years of experience

Enterprise SaaS · Cloud

Interested in this job?

Principal Site Reliability Engineer

Microsoft

Description For Principal Site Reliability Engineer

Responsibilities For Principal Site Reliability Engineer

Requirements For Principal Site Reliability Engineer

Benefits For Principal Site Reliability Engineer

Microsoft

Jobs Related To Microsoft Principal Site Reliability Engineer