Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs designs silicon and software that accelerates innovation for AWS, creating cloud solutions and custom chips for machine learning.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Join Amazon's innovative Annapurna Labs team as a Senior Machine Learning Engineer working on AWS Neuron, the complete software stack for AWS Trainium and Inferentia cloud-scale ML accelerators. This role focuses on distributed training development for cutting-edge ML models including Large Language Models (LLM), Stable Diffusion, and Vision Transformers. You'll collaborate with chip architects and software engineers to optimize performance on custom AWS silicon.

The position offers an excellent opportunity to work at the intersection of machine learning and hardware acceleration, developing solutions that push the boundaries of what's possible in AI training at scale. You'll be part of a team that values knowledge-sharing, mentorship, and career growth, working in an inclusive environment that celebrates diverse experiences and perspectives.

As part of AWS, you'll be contributing to the world's most comprehensive cloud platform, helping to pioneer new innovations in cloud computing. The role offers competitive compensation ranging from $151,300 to $261,500 based on location, plus equity and comprehensive benefits. The team maintains a strong focus on work-life harmony and provides extensive support for professional development.

Key technologies you'll work with include PyTorch, JAX, XLA, FSDP, Deepspeed, and Nemo, while developing solutions for AWS's custom ML accelerators like Trainium and Inferentia. This is an opportunity to make a significant impact on the future of machine learning infrastructure while working with some of the most advanced AI hardware and software systems available.

Last updated 14 hours ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models for peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Develop and enable ML model families including LLMs, Stable Diffusion, and Vision Transformers

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Full range of medical benefits
  • Financial benefits
  • Work-life harmony
  • Mentorship and career growth opportunities

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Sr. Machine Learning Engineer, AGIF | Finetuning

Senior Machine Learning Engineer position at Amazon's AGI Finetuning team, focusing on developing and maintaining evaluation systems for advanced AI models.

Sr. Software Development Engineer, Artificial General Intelligence

Senior Software Development Engineer role at Amazon's AGI team, focusing on developing advanced conversational AI capabilities for Alexa using LLMs and Gen AI.

Sr. Research Engineer, Machine Learning, AGI Foundations

Senior Research Engineer position at Amazon's AGI team, focusing on developing advanced multimodal ML systems and scaling pre-training workflows for LLMs and Generative AI.

Senior Delivery Consultant - Application Developer, Data & Machine Learning, WWPS ProServe

Senior ML and cloud architecture role at AWS ProServe, combining technical expertise with consulting to help customers implement AWS solutions, focusing on machine learning and data processing systems.

Sr. Machine Learning Engineer, Amazon Q in QuickSight

Senior Machine Learning Engineer position at Amazon working on Q in QuickSight, focusing on LLM and NLP applications for business intelligence solutions.