Taro Logo

System Development Manager, AWS Resilience, AWS Incident Response

Amazon is a global technology company that provides cloud computing, e-commerce, artificial intelligence, and digital streaming services.
Backend
Staff Software Engineer
In-Person
5+ years of experience
Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For System Development Manager, AWS Resilience, AWS Incident Response

AWS Resilience owns service to prevent and respond to availability and security issues for all AWS Services. As a System Development Manager on the AWS Incident Response team, you will manage automated tooling roadmaps and delivery for the detection and resolution of issues within AWS and Amazon infrastructure. You'll also direct the resolution of high visibility incidents, drive improvements in automation, tooling, and processes, and coordinate across project teams to expand the use of our tooling. Key responsibilities include defining and delivering business priorities, cross-site and cross-team coordination, incident/change management, and performance management/team health. This role offers great growth potential and an opportunity to make a huge impact on keeping the cloud running.

Last updated 10 months ago

Responsibilities For System Development Manager, AWS Resilience, AWS Incident Response

  • Define, plan, track and deliver strategic goals for the global AWS Incident Response team
  • Coordinate with counterparts to ensure clear communication between AWS Operations teams
  • Work with systems and product teams to create and maintain proper processes for monitoring and alarming on services
  • Manage inquiries regarding engagement processes and issues within the global Amazon platform
  • Drive initiatives to improve existing tools & processes
  • Provide feedback on new practices & procedures to scale with AWS Services expansion
  • Own all facets of performance and career management for the team

Requirements For System Development Manager, AWS Resilience, AWS Incident Response

  • 5+ years of direct experience with cloud hosting technologies (AWS, Azure, etc.)
  • 5+ years experience managing an engineering team operating at scale
  • Deep understanding of infrastructure delivered through the software development lifecycle in an API-enabled environment
  • Experience in implementing, supporting, and evaluating tools and services with a security, scalability, and performance mindset
  • Ability to handle multiple competing priorities in a fast-paced environment
  • Ability to interact with and influence people at all levels
  • Excellent written and verbal communication skills

Benefits For System Development Manager, AWS Resilience, AWS Incident Response

  • Equal opportunities employer
  • Diverse and inclusive workplace