Microsoft's Azure Customer Experience (CXP) team is seeking a Site Reliability Engineer to work on a High-Performance Computing (HPC) environment. This role is crucial in driving reliability engineering excellence within Azure's cloud infrastructure. The position combines technical expertise with customer interaction, focusing on maintaining and improving system reliability, availability, and performance.
The role operates within a fast-paced, agile team environment that emphasizes a startup-like culture. Key responsibilities include collaborating with SRE teams on automation solutions, working directly with customers to resolve pain points, and implementing proactive monitoring and alerting systems. The successful candidate will be instrumental in enhancing service telemetry and providing operational insights to Design and Product teams.
The position offers significant technical challenges in a supportive environment, with access to cutting-edge technology and collaboration with world-class engineers. The team's philosophy centers on customer-first approach, trust building, high responsiveness, and continuous improvement through automation and toil reduction.
This role requires a blend of technical expertise in software engineering or systems administration, strong problem-solving abilities, and excellent communication skills. The position involves regular travel to customer sites in South West UK and requires maintaining various security clearances. The role offers comprehensive benefits, including healthcare, educational resources, and various professional development opportunities.
Working at Microsoft means joining a company committed to empowering others through technology, with a strong focus on diversity, inclusion, and growth-mindset culture. The position offers the flexibility of up to 100% remote work, though it requires 25-50% travel when necessary.