Join AWS Neuron team as a Software Engineer focused on AI/ML distributed training. This role is part of the Machine Learning Applications (ML Apps) team, working on AWS's cloud-scale machine learning accelerators Inferentia and Trainium. You'll be responsible for developing and optimizing distributed training solutions for massive scale language models, vision transformers, and other ML models.
The position is within Annapurna Labs, acquired by AWS in 2015, which serves as AWS's infrastructure provider. You'll work alongside chip architects, compiler engineers, and runtime engineers to create cutting-edge distributed training solutions for Trn2 and Trn1 systems. The role requires expertise in both software development and machine learning, particularly with frameworks like FSDP, Deepspeed, and other distributed training libraries.
AWS offers an inclusive team culture with ten employee-led affinity groups and various learning experiences. The team values work-life balance, offering flexible working hours and supporting professional growth through mentorship and knowledge sharing. You'll be part of a diverse team working on revolutionary cloud infrastructure products that impact millions of users worldwide.
This is an opportunity to work with cutting-edge ML technology, contribute to high-impact projects, and shape the future of cloud-based machine learning infrastructure. The role combines technical depth in ML systems with the scale and impact of AWS's cloud platform, making it ideal for engineers passionate about both software development and machine learning.