Taro Logo

Site Reliability Engineer - Met Office

Microsoft is a global technology company on a mission to empower every person and organization on the planet to achieve more.
Site Reliability
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud
This job posting is no longer active. Check out these related jobs instead:

Job Description

Microsoft's Azure Customer Experience (CXP) team is seeking a Site Reliability Engineer to work on a High-Performance Computing (HPC) environment. This role is crucial in driving reliability engineering excellence within Azure's cloud infrastructure. The position combines technical expertise with customer interaction, focusing on maintaining and improving system reliability, availability, and performance.

The role operates within a fast-paced, agile team environment that emphasizes a startup-like culture. Key responsibilities include collaborating with SRE teams on automation solutions, working directly with customers to resolve pain points, and implementing proactive monitoring and alerting systems. The successful candidate will be instrumental in enhancing service telemetry and providing operational insights to Design and Product teams.

The position offers significant technical challenges in a supportive environment, with access to cutting-edge technology and collaboration with world-class engineers. The team's philosophy centers on customer-first approach, trust building, high responsiveness, and continuous improvement through automation and toil reduction.

This role requires a blend of technical expertise in software engineering or systems administration, strong problem-solving abilities, and excellent communication skills. The position involves regular travel to customer sites in South West UK and requires maintaining various security clearances. The role offers comprehensive benefits, including healthcare, educational resources, and various professional development opportunities.

Working at Microsoft means joining a company committed to empowering others through technology, with a strong focus on diversity, inclusion, and growth-mindset culture. The position offers the flexibility of up to 100% remote work, though it requires 25-50% travel when necessary.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer - Met Office

  • Collaborating with SRE teams on building and enhancing tooling and automation solutions
  • Working with customers to understand pain points around Supportability and SLO attainment
  • Handling service escalations and driving issues to resolution
  • Implementing changes to service telemetry
  • Enhancing customer facing experience through proactive alerting
  • Analyzing data and providing operational insights

Requirements For Site Reliability Engineer - Met Office

Linux
Kubernetes
  • In-depth technical experience in software engineering, network engineering, or systems administration
  • Operational experience in improving Service Reliability, Availability and Performance
  • Systematic problem-solving approach with effective communication skills
  • Expertise in analyzing and troubleshooting distributed systems
  • Ability to travel to customer site in South West UK
  • Prior HPC knowledge preferred

Benefits For Site Reliability Engineer - Met Office

Medical Insurance
Education Budget
Parental Leave
Mental Health Assistance
Vision Insurance
Dental Insurance
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities