At Infrastructure Reliability Engineering within Amazon, we are seeking talented Software Development Engineers to join our team focused on building scalable solutions that ensure the reliability of Amazon's critical systems. Our team develops and operates tools for distributed tracing, network analysis, and event correlation at Amazon scale. We work on detecting and preventing outages to maintain high availability across global infrastructure, directly impacting millions of customers.
The role involves working with core technologies including Java, Python, Linux, and AWS services to build intelligent and real-time insights into service-to-service communications and network traffic. You'll be part of developing solutions that support visibility into anomalous service behavior and ensure high availability for Amazon's fulfillment and robotics services.
We offer a collaborative environment where you'll work alongside talented engineers, Product Managers, Technical Program Managers, and Senior Leadership. The team values continuous learning and professional growth, encouraging members to expand their skills while solving complex problems that have global impact. You'll have the opportunity to work on greenfield programs while contributing to maintainable, high-quality software built and deployed through continuous delivery.
The ideal candidate should be passionate about creating robust automated testing systems, understand the challenges of operating large-scale systems in production, and be driven to deliver high-quality solutions on time. This role offers the chance to make meaningful contributions to Amazon's infrastructure while working with cutting-edge technologies and a supportive team committed to innovation.