AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. The role involves working with AWS's innovative ML accelerators - Inferentia and Trainium - and their corresponding servers (Trn1 and Inf1). You'll be responsible for developing and optimizing distributed training support for various ML models, including large language models like GPT-2/3 and vision transformers.
The position sits within Annapurna Labs, an AWS infrastructure provider acquired in 2015, which has delivered numerous successful products including AWS Nitro, ENA, EFA, Graviton, and F1 EC2 Instances. You'll work alongside chip architects, compiler engineers, and runtime engineers to create and optimize distributed training solutions using technologies like FSDP and Deepspeed.
The role combines deep technical expertise in both software development and machine learning, with a focus on performance optimization and scalability. You'll be part of a team that values work-life balance, mentorship, and career growth, with opportunities to work on cutting-edge ML infrastructure that impacts millions of users worldwide.
Amazon offers a comprehensive benefits package and a culture that embraces diversity through various employee-led affinity groups. The company's 16 Leadership Principles emphasize seeking diverse perspectives, continuous learning, and earning trust. The team supports flexible working hours and maintains a balanced approach to professional and personal life.
This is an excellent opportunity for someone passionate about ML infrastructure, distributed systems, and high-performance computing, with the chance to work on technology that powers some of the most advanced ML applications in the cloud computing industry.