Taro Logo

Software Development Engineer, AWS Incident Tooling & Response

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing.
Backend
Mid-Level Software Engineer
In-Person
5,000+ Employees
2+ years of experience
Enterprise SaaS · Cloud
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Software Development Engineer, AWS Incident Tooling & Response

Amazon Web Services (AWS) is seeking a Software Development Engineer to join their Incident Response Systems team. This role is crucial in building and maintaining systems that ensure AWS customers can rely on the highest-availability, lowest-latency cloud platform globally. The position involves working with AWS's largest product teams to develop systems that detect and mitigate operational issues before they impact customers.

As an SDE in this role, you'll be responsible for designing and implementing automated systems for fault containment, problem diagnosis, and issue resolution across multiple distributed architectures. These systems will analyze metric and dependency data from various sources, correlating them with customer impact to determine root causes without human intervention. The goal is to create solutions that can detect, diagnose, and repair operational defects automatically, maintaining AWS's reputation for stability and reliability.

The role offers significant growth opportunities, with exposure to senior technical leaders across AWS within the first year. You'll be designing and implementing new systems while investigating historic customer-impacting events to prevent future occurrences. The position is part of AWS Infrastructure Services, which manages all AWS global infrastructure, including data centers, servers, storage, networking, and cooling equipment.

The team culture emphasizes diversity, continuous learning, and work-life harmony. You'll join a collaborative environment with engineers from various backgrounds, including incident response veterans and traditional software engineers. The team values knowledge-sharing, mentorship, and career development, with access to numerous learning resources and career-advancing opportunities.

Key responsibilities include writing maintainable code, designing customer-focused systems, improving code quality and architecture, and understanding incident management processes. Daily activities involve coding, code reviews, documentation, and operational support. You'll have regular interactions with technical leaders, helping shape both team and organizational direction while focusing on delivering high-quality solutions for customers.

This role is ideal for someone who combines strong technical skills with a passion for building reliable, scalable systems and has an interest in incident response and automated problem-solving in cloud infrastructure.

Last updated 3 months ago

Responsibilities For Software Development Engineer, AWS Incident Tooling & Response

  • Write well-tested, maintainable code
  • Design, contribute to, and maintain systems which solve customer problems
  • Work with team-mates to improve code quality, system architecture and team processes
  • Learn about incident management processes to identify improvement opportunities
  • Design and implement systems which automate fault containment, problem diagnosis, and issue resolution
  • Create engagements, facilitate communication and coordination of response and mitigation
  • Work with teams across AWS to drive adoption of software

Requirements For Software Development Engineer, AWS Incident Tooling & Response

  • Experience (non-internship) in professional software development
  • Experience designing or architecting new and existing systems
  • Experience programming with at least one software programming language

Benefits For Software Development Engineer, AWS Incident Tooling & Response

  • Work-life harmony
  • Flexible working culture
  • Mentorship and career growth opportunities
  • Employee-led affinity groups
  • Ongoing learning experiences