Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing.
DevOps
Senior Software Engineer
In-Person
5,000+ Employees
6+ years of experience
Enterprise SaaS · Cloud

Description For Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

AWS Infrastructure Services is at the heart of Amazon's cloud operations, responsible for the design, planning, delivery, and operation of all AWS global infrastructure. This team maintains the foundation of AWS's cloud services, managing data centers, servers, storage, networking, and essential infrastructure equipment worldwide. As an Infrastructure Reliability Engineer, you'll join a diverse team tackling complex challenges in maintaining and improving AWS's vast infrastructure network.

The role combines technical expertise in reliability engineering with practical problem-solving skills. You'll work on critical systems that directly impact AWS's ability to provide consistent, high-quality cloud services to customers globally. Your responsibilities will span from proactive risk assessment to hands-on troubleshooting of sophisticated datacenter equipment.

This position offers unique opportunities to work with cutting-edge technology while contributing to the reliability of one of the world's largest cloud infrastructures. You'll collaborate with talented engineers across AWS, participate in critical decision-making processes, and drive improvements that impact millions of AWS customers.

The ideal candidate brings strong analytical skills, deep technical knowledge of infrastructure systems, and the ability to work effectively with both technical and business stakeholders. This role requires a combination of engineering expertise, project management skills, and the ability to drive results in a fast-paced environment.

AWS offers comprehensive benefits, including medical insurance, vision and dental coverage, and parental leave. The company promotes work-life harmony and provides extensive opportunities for professional growth through mentorship programs and hands-on experience with advanced technologies.

Join a team that values innovation, embraces diverse perspectives, and is committed to maintaining AWS's position as the world's leading cloud platform. Your work will directly contribute to the reliability and efficiency of AWS's global infrastructure, making a real impact on cloud computing worldwide.

Last updated 4 minutes ago

Responsibilities For Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

  • Drive reliability risk identification, assessment and mitigation for datacenter infrastructure equipment
  • Conduct root cause analysis of critical equipment failures
  • Drive continuous improvements to improve datacenter availability
  • Work with internal and external partners including suppliers
  • Develop and implement analytical and empirical approaches for product quality/reliability
  • Drive AWS application-specific requirements for lifecycle environmental and operational stress analysis
  • Develop datacenter system level reliability model
  • Monitor product performance and drive corrective actions
  • Conduct vendor auditing and quarterly reviews

Requirements For Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

Linux
Kubernetes
  • Bachelor's or Master's degree in Reliability Engineering, Physics, Electrical, Mechanical or Materials Engineering or related field
  • 6+ years of Reliability Engineering work experience in high reliability industry
  • 4+ years experience with failure analysis activities and root cause analysis
  • 4+ years experience with accelerated life testing, stress analysis and finite element analysis

Benefits For Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

Medical Insurance
Vision Insurance
Dental Insurance
Parental Leave
  • Work-life harmony
  • Mentorship and career growth opportunities
  • Employee-led affinity groups
  • Inclusive team culture

Interested in this job?

Jobs Related To Amazon Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

Systems Development Engineer, TrafficShift

Systems Development Engineer role at AWS TrafficShift team, focusing on network automation and availability services.

Sr. DevOps Operation Engineer

Senior DevOps Operation Engineer role at Amazon Devices, focusing on maintaining and optimizing development infrastructure for innovative consumer technology products.

LiT - Sr. Operations Engineer, World Wide Engineering Innovation

Senior Operations Engineer role at Amazon leading cross-functional projects in sortation and distribution solutions, combining technical expertise with operational leadership.

ML Support Engineer IV

Senior ML Support Engineer role at Amazon combining DevOps, Systems, and Software Engineering to build and maintain critical pricing infrastructure.

Sr. DevOps Operation Engineer

Senior DevOps Operation Engineer position at Amazon Devices, focusing on maintaining and optimizing development lifecycle for Kindle and other Amazon devices.