Taro Logo

Principal Site Reliability Engineer

Microsoft is a leading technology company building cloud services, software, and hardware for businesses and consumers worldwide.
$139,900 - $274,800
Site Reliability
Principal Software Engineer
Hybrid
5,000+ Employees
8+ years of experience
AI · Enterprise SaaS

Description For Principal Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Principal Site Reliability Engineer to join their mission of building the data platform for the age of AI. This role is part of the databases team that builds and maintains Microsoft's operational Database systems.

As a Principal SRE, you'll be responsible for taking a data-driven approach to solve Service Reliability problems. You'll analyze massive amounts of telemetry and Service Health indicators in near real-time, perform automated root cause analysis, and implement necessary mitigations to restore SLOs. The role involves close collaboration with engineering teams to enhance tooling and automation solutions for faster issue resolution.

Key responsibilities include:

  • Building and optimizing solutions for analyzing service health metrics
  • Collaborating with customers to understand pain points around Supportability and SLO attainment
  • Acting as technical point of contact for enterprise customer escalations
  • Implementing service telemetry improvements
  • Providing operational insights to Design and Product teams
  • Enhancing customer experience through proactive alerting

The position offers competitive compensation with a base pay range of $139,900 - $274,800 (higher in SF Bay Area and NYC). Microsoft provides comprehensive benefits including healthcare, educational resources, savings plans, parental leave, and more.

This is an excellent opportunity for an experienced SRE to make a significant impact on Microsoft's critical database infrastructure while working with cutting-edge cloud technologies and AI-enabled systems. The role offers a blend of technical challenge, customer interaction, and strategic influence on product development.

Last updated 2 days ago

Responsibilities For Principal Site Reliability Engineer

  • Collaborate with engineering teams on building and enhancing tooling and automation solutions
  • Work with customers to understand pain points around Supportability and SLO attainment
  • Handle service escalations and drive issues to resolution
  • Implement changes to service telemetry
  • Provide proactive alerting based on utilization and trends
  • Analyze data and provide operational insights to Design and Product teams

Requirements For Principal Site Reliability Engineer

Java
JavaScript
Python
  • 8+ years technical experience in software engineering, network engineering, or systems administration
  • Experience with distributed systems and databases
  • Master's Degree in Computer Science or related technical field preferred
  • Hands-on experience managing live-site operations
  • Experience leading incident response for distributed systems
  • Experience with root cause analysis and performance tuning

Benefits For Principal Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Principal Site Reliability Engineer