Join AWS's Machine Learning Infrastructure team at Annapurna Labs, where you'll lead the development of critical tools and infrastructure supporting AWS ML and High Performance Computing technologies. As a Senior Software Development Engineer, you'll be responsible for building and maintaining the infrastructure that monitors and reports on functionality and performance of massive testing workloads at scale. The role combines expertise in CI/CD automation, ML/HPC benchmarks, and cloud infrastructure to support cutting-edge AWS offerings including Trainium, Graviton, and Elastic Fabric Adapter (EFA).
You'll work with technologies like Typescript, CDK, SLURM, and Active Directory to create efficient, cost-effective cluster management solutions. The position requires strong technical leadership, with opportunities to mentor other engineers and communicate effectively with stakeholders. You'll be part of Annapurna Labs, an AWS subsidiary focused on building innovative software and hardware solutions that make ML on EC2 more effective.
The team's mission is to make AWS the most cost-effective platform for AI at scale. You'll join a culture that values diversity, work-life harmony, and continuous learning. The role offers comprehensive benefits, career growth opportunities, and the chance to work with cutting-edge ML and cloud technologies. This position is perfect for experienced engineers who want to impact the future of cloud-based machine learning infrastructure while working with a dedicated team of innovators.