Taro Logo

Principal Site Reliability Engineer

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further in a cloud-enabled world.
$139,900 - $274,800
Site Reliability
Principal Software Engineer
Remote
5,000+ Employees
6+ years of experience
Enterprise SaaS · Cloud

Description For Principal Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Principal Site Reliability Engineer to join their databases team, focusing on Azure Cosmos DB - a globally distributed, massively scalable, multi-model cloud database service. This role combines technical expertise with service reliability to maintain Microsoft's operational Database systems.

The position offers an opportunity to work with cutting-edge technology in a team that operates like a startup while being part of one of the world's largest tech companies. You'll be responsible for ensuring 99.99% availability and <10ms latency SLAs for critical systems used in Healthcare, Retail, Telecommunications, and IoT applications.

As a Principal SRE, you'll focus on automating root cause analysis and issue mitigation, often addressing problems before they impact customers. The role requires a data-driven approach to solving Service Reliability problems, analyzing massive amounts of telemetry, and implementing automated solutions to maintain service level objectives (SLOs).

The position offers competitive compensation ($139,900 - $274,800 base salary range, higher in SF and NYC areas) and comprehensive benefits including healthcare, educational resources, savings plans, and parental leave. You'll be part of Microsoft's inclusive culture that values diverse perspectives and collaborative problem-solving.

Key responsibilities include building automation solutions, collaborating with customers on supportability issues, implementing service telemetry, and providing operational insights to product teams. The ideal candidate will have 6+ years of technical engineering experience, strong coding skills, and extensive experience with large-scale cloud services.

This is an excellent opportunity for a seasoned SRE professional who wants to make a significant impact on one of Microsoft's fastest-growing Azure services while working with cutting-edge cloud technology and contributing to systems that serve millions of users worldwide.

Last updated 11 hours ago

Responsibilities For Principal Site Reliability Engineer

  • Collaborating with engineering teams on building and enhancing tooling and automation solutions
  • Working with customers to understand pain points around Supportability and SLO attainment
  • Communicate technically and interface with enterprise customers for service escalations
  • Design and implement service telemetry changes
  • Enhance customer facing experience through proactive alerting
  • Analyze data and provide operational insights to Design and Product teams

Requirements For Principal Site Reliability Engineer

Python
Java
JavaScript
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding
  • 6+ years of experience running large scale cloud services
  • 3+ years of operational experience in improving Service Reliability, Availability and Performance
  • Understanding of Observability and MELT implementation patterns for large-scale services
  • Experience in Logic Apps and authoring Jupyter Notebooks
  • Expertise in analyzing, troubleshooting, and automating root cause analysis
  • Systematic problem-solving approach with effective communication skills
  • Ability to deal with ambiguity in a fast-paced environment

Benefits For Principal Site Reliability Engineer

Medical Insurance
Parental Leave
Vision Insurance
Dental Insurance
401k
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Principal Site Reliability Engineer