Senior Site Reliability Engineer

Microsoft is a global technology company that empowers every person and organization on the planet to achieve more.
$117,200 - $229,200
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
6+ years of experience
Enterprise SaaS · Cloud

Description For Senior Site Reliability Engineer

Microsoft Cloud Operations & Innovation (CO+I) is seeking a Senior Site Reliability Engineer to join their Cloud Infrastructure Health team. This role is at the forefront of Microsoft's cloud computing transformation, working with state-of-the-art distributed systems that handle petabyte-scale telemetry using Machine Learning for Cloud Availability and Safety goals.

The position involves maintaining and improving critical datacenter infrastructure that powers Microsoft's cloud services, which enable approximately 30% of Microsoft's revenue through Commercial Cloud operations. You'll be working on systems that analyze massive amounts of telemetry data in real-time and offline to deliver time-sensitive insights directly impacting Cloud Operations.

As an SRE, you'll be responsible for ensuring service reliability and scalability, participating in design reviews, and developing automation tools. The role combines software development expertise with operations knowledge to build and maintain highly available systems. You'll collaborate with feature teams to increase deployment velocity while ensuring safety, analyze telemetry data for capacity planning, and participate in on-call rotations to resolve live site incidents.

The ideal candidate will bring 6+ years of technical experience in software engineering or systems administration, with particular expertise in system uptime, performance monitoring, and capacity planning. You'll be working in a hybrid environment (up to 50% work from home) with some travel requirements (0-25%).

Microsoft offers comprehensive benefits including industry-leading healthcare, educational resources, savings and investment options, parental leave, and generous time off. The position offers competitive compensation with a base pay range of $117,200 - $229,200 per year (varies by location).

Join a team that's essential to Microsoft's cloud infrastructure, where you'll have the opportunity to make significant impacts on global-scale systems while working with cutting-edge technology and talented engineers.

Last updated 8 hours ago

Responsibilities For Senior Site Reliability Engineer

  • Own deployment, availability, reliability, performance and customer escalation targets for Critical Environment Telemetry solutions
  • Design, develop, and maintain data pipelines and back-end services for real-time decisioning
  • Write high quality, maintainable and high-performance code
  • Manage automated unit and integration test suites
  • Work with Project Managers and business stakeholders to design and deliver new features
  • Identify opportunities and drive implementation of monitoring and automation capabilities
  • Investigate and resolve Customer Reported Incidents

Requirements For Senior Site Reliability Engineer

Kubernetes
Linux
  • 6+ years technical experience in software engineering, network engineering, or systems administration
  • 2+ years of experience working in systems uptimes, performance, service monitoring and capacity planning
  • Bachelor's Degree in Computer Science, Information Technology, or related field
  • Must pass Microsoft Cloud background check
  • Experience with high-scale distributed systems
  • Strong programming and system design skills

Benefits For Senior Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
Parental Leave
401k
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Microsoft focusing on Enterprise Identity and Access Management systems and security infrastructure.

Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Azure focusing on platform reliability, customer experience, and cloud infrastructure services in Sydney.

Senior Site Reliability Engineer - CTJ - Top Secret

Senior Site Reliability Engineer role at Microsoft working on Office 365 government cloud services, requiring Top Secret clearance and strong distributed systems experience.

Senior Site Reliability Engineer - CTJ - POLY

Senior Site Reliability Engineer role at Microsoft working on Azure SQL services for government clouds, requiring Top Secret clearance and expertise in distributed systems.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Microsoft, focusing on O365 Enterprise Cloud services with emphasis on AI/ML integration and system reliability.