Senior Site Reliability Engineer

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further in a cloud-enabled world.
$108,100 - $199,700
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
6+ years of experience
Enterprise SaaS · AI · Cloud

Description For Senior Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Senior Site Reliability Engineer to join their databases team, specifically working on Azure Cosmos DB. This role focuses on maintaining Microsoft's operational Database systems and ensuring high availability and performance. The position involves working with globally distributed, massively scalable cloud database services, building and optimizing solutions for automated root cause analysis, and maintaining strict Service Level Objectives (SLOs). The ideal candidate will take a data-driven approach to solve Service Reliability problems, collaborate with engineering teams, and work directly with enterprise customers. This is an opportunity to work with cutting-edge technology in a fast-paced environment, contributing to one of Azure's fastest-growing services. The role offers competitive compensation, comprehensive benefits, and the chance to impact critical systems across healthcare, retail, telecommunications, and IoT sectors. Microsoft values diversity and encourages applications from candidates with different experiences and perspectives.

Last updated 16 days ago

Responsibilities For Senior Site Reliability Engineer

  • Collaborating with engineering teams on building and enhancing tooling and automation solutions
  • Working with customers to understand pain points around Supportability and SLO attainment
  • Designing and implementing service telemetry changes
  • Enhancing customer facing experience through proactive alerting
  • Analyzing data and providing operational insights to Design and Product teams
  • Being the single point of contact for large enterprise customers for service escalations

Requirements For Senior Site Reliability Engineer

Python
Java
  • 6+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree with 3+ years experience OR Master's Degree with 2+ years experience
  • Understanding of Observability and MELT implementation patterns for large-scale services
  • Experience in Logic Apps and authoring Jupyter Notebooks
  • Experience in analyzing and troubleshooting large-scale distributed systems
  • 5+ years of SRE or SWE experience running large scale cloud services
  • 5+ years of hands-on experience in Python/Java/C#
  • 3+ years of operational experience in improving Service Reliability
  • Must pass Microsoft Cloud Background Check

Benefits For Senior Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
Mental Health Assistance
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior SRE position at Microsoft maintaining global-scale Kubernetes platform with focus on automation and system reliability.

Senior Site Reliability Engineer (SRE) - Teams

Senior Site Reliability Engineer position at Microsoft Teams, focusing on improving service reliability, performance, and security through software engineering solutions.

Senior Site Reliability Engineer - CTJ - POLY

Senior SRE role at Microsoft working on Azure SQL services for government clouds, requiring security clearance and distributed systems expertise.

Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Azure focusing on platform reliability, customer experience, and cloud infrastructure in Sydney.

Senior Site Reliability Engineer

Senior Site Reliability Engineer at Microsoft, ensuring product reliability and solving complex customer issues in Windows services.