Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs designs silicon and software that accelerates innovation for AWS cloud solutions.
$129,300 - $223,600
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, an Amazon company, is seeking a Senior Machine Learning Engineer to join their AWS Neuron Distributed Training team. This role focuses on developing and optimizing machine learning solutions for AWS's custom silicon accelerators - Trainium and Inferentia. The position involves working with cutting-edge ML technologies, including Large Language Models (LLMs) like GPT and Llama, as well as other ML model families such as Stable Diffusion and Vision Transformers.

The role requires expertise in distributed training frameworks and collaboration with cross-functional teams of chip architects, compiler engineers, and runtime engineers. You'll be responsible for implementing distributed training support in major frameworks like PyTorch and JAX, while optimizing performance on AWS's custom silicon platforms.

Annapurna Labs, acquired by AWS in 2015, has a strong track record of delivering innovative infrastructure solutions including AWS Nitro, Graviton, and ML accelerators. The team culture emphasizes knowledge-sharing, mentorship, and continuous learning, with a strong focus on work-life harmony and career development.

The position offers competitive compensation ranging from $129,300 to $223,600 based on location and experience, plus additional benefits. The team values diverse experiences and backgrounds, fostering an inclusive environment through employee-led affinity groups and ongoing learning opportunities.

This is an excellent opportunity for experienced software engineers with ML expertise to work on cutting-edge technology that powers AWS's machine learning infrastructure, while being part of a supportive team that prioritizes both technical excellence and professional growth.

Last updated 5 hours ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Develop and tune distributed training solutions with Trainium instances
  • Enable and performance tune various ML model families including LLMs

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 2+ years of system design/architecture experience
  • Experience programming with at least one software programming language
  • Experience with training large ML models using Python
  • Knowledge of distributed training libraries like FSDP, Deepspeed, Nemo

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
  • Medical Insurance
  • Work-life balance
  • Mentorship opportunities
  • Career growth opportunities
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Delivery Consultant - Application Developer, Data & Machine Learning, WWPS ProServe

Senior ML and cloud architecture role at AWS ProServe, combining technical expertise with consulting to help customers implement AWS solutions, focusing on machine learning and data processing systems.

Sr. Machine Learning Engineer, Amazon Q in QuickSight

Senior Machine Learning Engineer position at Amazon working on Q in QuickSight, focusing on LLM and NLP applications for business intelligence solutions.

Senior Software Development Engineer - Amazon Music Machine Learning

Senior Software Engineer role at Amazon Music focusing on machine learning and personalization systems to enhance music discovery and recommendations for millions of users globally.

Senior Software Development Engineer, Sponsored Products

Senior Software Development Engineer position at Amazon Ads, focusing on machine learning and large-scale systems for Sponsored Products, offering competitive compensation and growth opportunities.

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior ML Engineer role at Amazon's Annapurna Labs, focusing on distributed training development for AWS Neuron ML accelerators, working with cutting-edge AI models and custom silicon.