Support Engineer - Incident Management, AWS Incident Response (AIR)

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform.
Backend
Mid-Level Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
Enterprise SaaS · Cloud
This job posting may no longer be active. You may be interested in these related jobs instead:
Network Development Engineer, Datacentre Fabrics - Product Engineering

Network Development Engineer role at AWS focusing on datacenter network infrastructure deployment, scaling, and automation across global facilities.

Software Development Engineer, Data Plane, NPD Forwarding Stack, Data Plane

AWS is seeking a Software Development Engineer to develop Linux-based networking platforms and embedded routing systems for one of the world's largest networks.

Software Development Engineer, AWS Demand Planning

AWS Demand Planning Software Engineer role building forecasting systems for EC2 capacity planning, offering $129K-$223K salary plus benefits.

Software Development Engineer, AWS Demand Planning

AWS Demand Planning seeks Software Engineer to build forecasting systems for EC2 capacity. Full-stack role with backend focus, 3+ years experience required.

Software Development Engineer, Alexa Communications

Software Development Engineer role at Amazon's Alexa Communications team, building voice-controlled communication features for Alexa devices and apps.

Description For Support Engineer - Incident Management, AWS Incident Response (AIR)

AWS Incident Response is at the heart of high availability of Amazon Web Services. We make customer impacting events shorter and less frequent by providing large scale event and incident management. Our automated tooling quickly identifies the cause of an issue and helps mitigate its impact, and much of our engineer time is spent on projects to improve the tooling and automation. We also provide manual incident management for AWS and other Amazon groups, directing the resolution of an issue with service teams, and diving deep into those events to drive improvements to the tooling.

As a Support Engineer on the team, you will:

  • Lead projects and build processes to reduce the duration, frequency, and impact of issues within the AWS and Amazon infrastructure.
  • Direct the resolution of high visibility incidents by leading conference calls and teams across the globe.
  • Drive improvements into our automation, tooling, and processes based on data learned from incidents.
  • Participate in project teams to expand use of our tooling to additional areas across Amazon.
  • Have the opportunity to grow your coding skills by taking on development projects matched to your ability level.

Key responsibilities include:

  • Drive the resolution of large scale customer impacting issues as part of a team rotation, including some weekends and holidays
  • Identify and troubleshoot recurring platform issues and own projects to drive improvements
  • Participate in Agile sprints to evolve business processes and technologies
  • Create and review documentation; design new standard operating procedures
  • Mentor peers in your areas of technical and operational strength
  • Lead projects and teams across the globe to drive operational improvements

The AWS Incident Response (AIR) team is Amazon's central defense against large-scale incidents and drives operational excellence across all of Amazon businesses. Our engineers are front-and-center in driving down event duration through experience in operational excellence, current best practices and incident management tooling.

Join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers.

Last updated 7 months ago

Responsibilities For Support Engineer - Incident Management, AWS Incident Response (AIR)

  • Drive resolution of large scale customer impacting issues
  • Identify and troubleshoot recurring platform issues
  • Participate in Agile sprints to evolve business processes and technologies
  • Create and review documentation; design new standard operating procedures
  • Mentor peers in technical and operational areas
  • Lead projects and teams across the globe to drive operational improvements

Requirements For Support Engineer - Incident Management, AWS Incident Response (AIR)

Linux
  • Experience troubleshooting and debugging technical systems
  • Experience in agile/scrum or related collaborative workflows
  • Experience troubleshooting and documenting findings
  • 3+ years of technical support or related experience

Benefits For Support Engineer - Incident Management, AWS Incident Response (AIR)

Medical Insurance
Dental Insurance
Vision Insurance
  • Mentorship & Career Growth
  • Work/Life Balance
  • Inclusive Team Culture

Interested in this job?