Principal Software Engineer

Microsoft empowers every person and organization on the planet to achieve more through innovative technology solutions.
Site Reliability
Principal Software Engineer
Hybrid
5,000+ Employees
10+ years of experience
Enterprise SaaS

Description For Principal Software Engineer

Microsoft's COSMIC team is seeking a Principal Software Engineer to join their Site Reliability Engineering team. The role focuses on maintaining and enhancing a global-scale managed-runtime environment based on Azure Kubernetes Service for Microsoft Substrate service and developers. COSMIC operates as a 'Kubernetes PaaS', providing critical infrastructure components for deployment, upgrades, security, observability, and debugging.

As a Principal Engineer, you'll be responsible for ensuring the health and reliability of the Cosmic platform, managing agent updates, and implementing automated solutions for incident response and remediation. The position requires deep expertise in distributed systems, cloud technologies, and DevOps practices, with a focus on building highly scalable and reliable services.

The role offers an opportunity to work with cutting-edge cloud technology at massive scale, directly impacting Microsoft's core infrastructure. You'll collaborate with various stakeholders, lead technical initiatives, and mentor team members while maintaining high standards for code quality and system reliability.

Microsoft offers comprehensive benefits including industry-leading healthcare, educational resources, investment options, and generous parental leave. The position is hybrid, allowing up to 50% work from home, with 0-25% travel requirements. This is an excellent opportunity for experienced engineers looking to make a significant impact at one of the world's leading technology companies.

The ideal candidate will bring 10+ years of technical engineering experience, strong leadership skills, and a proven track record of building and maintaining large-scale distributed systems. Experience with Azure cloud services and Kubernetes is highly valuable for this position.

Last updated 11 days ago

Responsibilities For Principal Software Engineer

  • Drive the design and implementation of features/service/components incorporating dependencies from other applications/tech stacks
  • Collaborate with stakeholders to determine user requirements and incorporate feedback
  • Lead by example within the team by producing extensible and maintainable code
  • Review team member's code to ensure quality standards and reliability
  • Maintain health of Cosmic platform by ensuring agent updates and upgrades
  • Debug issues and implement auto-remediation solutions

Requirements For Principal Software Engineer

Kubernetes
  • Bachelor's Degree in Computer Science or related field AND 10+ years technical engineering experience
  • 5+ years experience designing, developing, and shipping reliable distributed systems
  • Experience in DevOps to maintain live services
  • Cloud and services experience, Azure experience highly desired
  • Experience with Agile development processes
  • Must pass Microsoft Cloud Background Check

Benefits For Principal Software Engineer

Medical Insurance
Dental Insurance
Vision Insurance
Education Budget
Parental Leave
401k
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Principal Software Engineer

Principal Site Reliability Engineer

Principal Site Reliability Engineer role at Microsoft, focusing on Azure Cosmos DB service reliability and automation, offering remote work and competitive compensation.

Director, Software Engineering, Site Reliability

Lead Site Reliability Engineering at LinkedIn, directing 40+ engineers to ensure reliability of critical infrastructure systems including streaming, batch processing, and data platforms.

Principal Software Engineer - Site Reliability Engineering

Principal SRE position at Roblox leading reliability initiatives, building resilient systems, and mentoring engineers to support platform scaling for millions of users.

Director, Software Engineering, Site Reliability

Lead a 40+ person Site Reliability Engineering team at LinkedIn Bengaluru, focusing on infrastructure reliability, automation, and system scalability.

Director, Software Engineering, Site Reliability

Lead LinkedIn's Site Reliability Engineering team in Bengaluru, directing 40+ engineers and driving infrastructure reliability for critical systems.