Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs designs silicon and software that accelerates innovation for AWS cloud solutions.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, an Amazon company, is seeking a Senior Machine Learning Engineer to join their AWS Neuron Distributed Training team. This role focuses on developing and optimizing distributed training solutions for AWS's custom ML accelerators - Trainium and Inferentia.

The position involves working with cutting-edge ML technologies, including Large Language Models (LLMs) like GPT and Llama, as well as other ML model families such as Stable Diffusion and Vision Transformers. You'll be collaborating with chip architects, compiler engineers, and runtime engineers to build and optimize distributed training solutions.

As part of AWS, you'll be working with the world's most comprehensive cloud platform, helping to pioneer new innovations in cloud computing. The team maintains a strong culture of mentorship and knowledge-sharing, with opportunities for career growth and development.

The role offers competitive compensation ranging from $151,300 to $261,500 based on location, plus equity and comprehensive benefits. You'll be part of a diverse, inclusive environment that values work-life harmony and embraces unique perspectives.

Key technical aspects include working with PyTorch, JAX, XLA, and distributed training libraries like FSDP, Deepspeed, and Nemo. You'll be responsible for optimizing performance on AWS custom silicon and ensuring efficient model training at scale.

The ideal candidate should have strong software development skills, deep technical expertise in machine learning, and the ability to work effectively in cross-functional teams. This is an opportunity to shape the future of ML infrastructure at AWS while working with some of the most advanced AI/ML technologies available.

Join a team that's dedicated to innovation, values continuous learning, and offers the chance to work on challenging problems at global scale. Your work will directly impact AWS customers' ability to train and deploy large-scale machine learning models efficiently.

Last updated 39 minutes ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models for peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Develop and enable performance tuning of ML model families including LLMs

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Full range of medical benefits
  • Financial benefits
  • Work-life harmony
  • Mentorship and career growth opportunities

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer - ML, Amazon Demand Side Platform

Senior Software Engineering role at Amazon focusing on machine learning and distributed systems for the Amazon Demand Side Platform, offering competitive compensation and growth opportunities.

Sr. Machine Learning Engineer, Amazon Music Search

Senior Machine Learning Engineer position at Amazon Music focused on developing AI-powered search solutions and leading technical teams in Bengaluru.

Software Development Engineer, Ring Cloud Computer Vision

Senior Software Engineer role at Amazon Ring, focusing on cloud-based computer vision services and AI-powered distributed systems serving millions of users globally.

Sr Software Dev Engineer, Deep Learning Compilers

Senior Software Engineering role at Amazon focusing on deep learning compiler development for AI acceleration in consumer devices, offering competitive compensation and the chance to work on cutting-edge technology.

Senior Software Engineer, Amazon Games AI Research

Senior Software Engineer position at Amazon Games focusing on AI/ML innovation in gaming, implementing advanced AI systems and tools for game development.