AWS Neuron is seeking a Senior Software Engineer to join their Machine Learning Applications (ML Apps) team, focusing on distributed training solutions. This role combines deep software engineering expertise with machine learning knowledge to develop and optimize ML frameworks for AWS's custom silicon. You'll work on AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators.
The position involves working with cutting-edge ML models including large language models like GPT-2/3, stable diffusion, and Vision Transformers. You'll collaborate with chip architects and engineers to build distributed training solutions using technologies like FSDP and Deepspeed. The role requires expertise in both software development and machine learning, particularly in Python-based frameworks.
As part of AWS Utility Computing, you'll contribute to foundational services that power cloud computing worldwide. The team culture emphasizes learning, diversity, and work-life harmony. Amazon offers comprehensive benefits, mentorship opportunities, and strong career growth potential.
Key responsibilities include implementing distributed training support across major ML frameworks, optimizing model performance on custom silicon, and leading technical initiatives. The ideal candidate brings 5+ years of software development experience, strong ML knowledge, and leadership experience.
This role offers the opportunity to work on next-generation AI infrastructure at scale, with competitive compensation ranging from $151,300 to $261,500 based on location, plus equity and comprehensive benefits. Join us in shaping the future of machine learning infrastructure at AWS.