Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs designs silicon and software that accelerates innovation for AWS cloud solutions.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, an Amazon company, is seeking a Senior Machine Learning Engineer to join their AWS Neuron Distributed Training team. This role focuses on developing and optimizing machine learning solutions for AWS's custom silicon accelerators - Trainium and Inferentia. The position involves working with cutting-edge ML technologies, including Large Language Models (LLM) like GPT and Llama, as well as Stable Diffusion and Vision Transformers.

The role combines deep technical expertise in machine learning with hands-on software development, requiring proficiency in distributed training frameworks like FSDP, Deepspeed, and Nemo. You'll work closely with cross-functional teams including chip architects and compiler engineers to build and optimize distributed training solutions.

AWS Neuron represents the complete software stack for AWS's cloud-scale Machine Learning accelerators, and this position offers the opportunity to work on next-generation AI infrastructure. The team maintains a strong culture of mentorship and knowledge-sharing, with emphasis on career growth and professional development.

As part of Amazon Web Services (AWS), the world's leading cloud platform, you'll be at the forefront of cloud computing innovation. The role offers competitive compensation, comprehensive benefits, and the chance to work on technology that powers some of the world's most successful businesses.

The ideal candidate will bring strong software development skills, deep ML expertise, and the ability to collaborate effectively across teams. This is an opportunity to shape the future of machine learning infrastructure while working with some of the most advanced AI technologies available today.

Last updated 2 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Develop and enable ML model families including LLMs

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Medical benefits
  • 401k
  • Work-life harmony
  • Mentorship opportunities
  • Career growth resources
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer, SageMaker HyperPod Data Plane

Senior Software Engineer role at AWS building next-gen AI infrastructure for large-scale model training, focusing on distributed systems and machine learning platforms.

Senior Software Development Engineer, AWS Neuron Inference

Senior SDE role at AWS Neuron focusing on ML model optimization and distributed inference solutions for cloud-scale accelerators.

Sr. Machine Learning Engineer, AGIF | Finetuning

Senior Machine Learning Engineer position at Amazon's AGI Finetuning team, focusing on developing and maintaining evaluation systems for advanced AI models.

Sr. Software Development Engineer, Artificial General Intelligence

Senior Software Development Engineer role at Amazon's AGI team, focusing on developing advanced conversational AI capabilities for Alexa using LLMs and Gen AI.

Sr. Research Engineer, Machine Learning, AGI Foundations

Senior Research Engineer position at Amazon's AGI team, focusing on developing advanced multimodal ML systems and scaling pre-training workflows for LLMs and Generative AI.