Microsoft's COSMIC team is seeking a Principal Software Engineer to join their Site Reliability Engineering team. The role focuses on maintaining and enhancing a global-scale managed-runtime environment based on Azure Kubernetes Service for Microsoft Substrate service and developers. COSMIC operates as a 'Kubernetes PaaS', providing critical infrastructure components for deployment, upgrades, security, observability, and debugging.
As a Principal Engineer, you'll be responsible for ensuring the health and reliability of the Cosmic platform, managing agent updates, and implementing automated solutions for incident response and remediation. The position requires deep expertise in distributed systems, cloud technologies, and DevOps practices, with a focus on building highly scalable and reliable services.
The role offers an opportunity to work with cutting-edge cloud technology at massive scale, directly impacting Microsoft's core infrastructure. You'll collaborate with various stakeholders, lead technical initiatives, and mentor team members while maintaining high standards for code quality and system reliability.
Microsoft offers comprehensive benefits including industry-leading healthcare, educational resources, investment options, and generous parental leave. The position is hybrid, allowing up to 50% work from home, with 0-25% travel requirements. This is an excellent opportunity for experienced engineers looking to make a significant impact at one of the world's leading technology companies.
The ideal candidate will bring 10+ years of technical engineering experience, strong leadership skills, and a proven track record of building and maintaining large-scale distributed systems. Experience with Azure cloud services and Kubernetes is highly valuable for this position.