Site Reliability Engineering II

Microsoft

Microsoft is a company that builds cloud platforms, software, and services, leading digital transformation in the age of cloud and AI.

Redmond, WA, USA

$98,300 - $193,200

Site Reliability

Mid-Level Software Engineer

Remote

5,000+ Employees

4+ years of experience

Enterprise SaaS · Cloud

Description For Site Reliability Engineering II

Microsoft's Azure Data engineering team is seeking a Site Reliability Engineer II to join their databases team, focusing on operational Database systems. This role is part of Azure Cosmos DB, Microsoft's globally distributed, massively scalable, multi-model cloud database service.

As an SRE II, you'll be responsible for maintaining and improving service reliability for one of Azure's fastest-growing services. The position involves working with critical systems in Healthcare, Retail, Telecommunications, and IoT, where service availability and latency are paramount. Azure Cosmos DB provides financially backed SLAs of 99.99% availability and <10ms latency.

Key responsibilities include:

Building and optimizing solutions for analyzing massive amounts of telemetry and service health indicators in near real-time
Performing automated root cause analysis and implementing necessary mitigations to restore SLOs
Collaborating with engineering teams on automation solutions
Working directly with enterprise customers to resolve service escalations
Contributing to the enhancement of customer-facing experiences through proactive monitoring and alerting

The role offers competitive compensation ($98,300 - $193,200 base salary range) and comprehensive benefits including healthcare, educational resources, and parental leave. This is a remote-friendly position with up to 100% work from home flexibility and 0-25% travel requirements.

The ideal candidate will bring 4+ years of technical experience in software engineering or systems administration, with specific expertise in SRE practices and cloud services. You'll join a diverse team that values different perspectives and operates with a startup mindset while having the resources and impact of a global technology leader.

This is an excellent opportunity for someone passionate about service reliability, automation, and working with cutting-edge cloud technology at scale. You'll be at the forefront of building and shaping the Livesite Automation and AI Ops stack in Cosmos DB, leading the path for broader adoption across Microsoft Azure.

Last updated 2 days ago

Responsibilities For Site Reliability Engineering II

Collaborating with engineering teams on building and enhancing tooling and automation solutions
Working with customers to understand pain points around Supportability and SLO attainment
Implementing changes to service telemetry for automation consumption
Enhancing customer facing experience through proactive alerting
Analyzing data and providing operational insights to Design and Product teams
Interface with large enterprise customers for handling service escalations

Requirements For Site Reliability Engineering II

Python

4+ years technical experience in software engineering, network engineering, or systems administration
3+ years of SRE or SWE experience running large scale cloud services
2+ years of operational experience in improving Service Reliability, Availability and Performance
Understanding of Observability and MELT implementation patterns for large-scale services
Experience in Logic Apps and authoring Jupyter Notebooks
Systematic problem-solving approach with effective communication skills
Ability to deal with ambiguity in a fast-paced environment

Benefits For Site Reliability Engineering II

Medical Insurance

Parental Leave

401k

Education Budget

Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Microsoft

Microsoft is a company that builds cloud platforms, software, and services, leading digital transformation in the age of cloud and AI.

Redmond, WA, USA

$98,300 - $193,200

Site Reliability

Mid-Level Software Engineer

Remote

5,000+ Employees

4+ years of experience

Enterprise SaaS · Cloud

Interested in this job?

Jobs Related To Microsoft Site Reliability Engineering II

Site Reliability Engineer II

Microsoft

Microsoft is hiring a Site Reliability Engineer II to join their Security team, focusing on cloud infrastructure reliability and security solutions with competitive pay and benefits.

Site Reliability Engineer II

Microsoft

Site Reliability Engineer II position at Microsoft working on the Fabric platform team, ensuring reliability and performance of cloud data services with up to 100% remote work option.

Site Reliability Engineer II

Microsoft

Microsoft is hiring a Site Reliability Engineer II to join their Security team, focusing on infrastructure reliability and security solutions with hybrid work options in Redmond, WA.

Site Reliability Engineer II- CTJ - Top Secret

Microsoft

Microsoft is hiring a Site Reliability Engineer II to help secure and maintain large-scale cloud services, requiring Top Secret clearance and offering hybrid work in Redmond, WA.

Site Reliability Engineer

Microsoft

Microsoft is seeking a Site Reliability Engineer to support and secure virtualization services, focusing on Azure infrastructure and identity management with DevOps practices.