Taro Logo

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs designs silicon and software that accelerates innovation for AWS, creating cloud solutions and custom chips.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, an Amazon company, is seeking a Senior Machine Learning Engineer to join their AWS Neuron Distributed Training team. This role sits at the intersection of cutting-edge AI/ML technology and cloud infrastructure, focusing on developing and optimizing distributed training solutions for AWS's custom silicon accelerators.

The position involves working with state-of-the-art machine learning models, including Large Language Models (LLM) like GPT and Llama, as well as Stable Diffusion and Vision Transformers. You'll be responsible for implementing distributed training support in major frameworks like PyTorch and JAX, while collaborating closely with chip architects and compiler engineers to maximize performance on AWS's custom silicon platforms.

As a senior engineer, you'll lead technical initiatives and work with cross-functional teams to solve complex challenges in machine learning infrastructure. The role requires deep expertise in both software development and machine learning, with a focus on distributed systems and performance optimization.

The team operates within AWS's innovative culture, emphasizing mentorship, knowledge-sharing, and career growth. You'll be part of an organization that values diverse experiences and perspectives, with access to various employee-led affinity groups and ongoing learning opportunities.

AWS offers a comprehensive benefits package, including competitive base pay ranging from $151,300 to $261,500 depending on location, plus equity and other compensation components. The company emphasizes work-life harmony and provides extensive resources for professional development.

This is an excellent opportunity for experienced software engineers passionate about machine learning to work on cutting-edge technology that powers some of the world's most advanced AI infrastructure. You'll be contributing to solutions that help customers solve previously unimaginable challenges while working with a team that's dedicated to innovation and technical excellence.

Last updated 2 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Development, enablement and performance tuning of ML model families

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Medical benefits
  • Financial benefits
  • Work-life harmony
  • Career growth opportunities
  • Mentorship programs

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training