Taro Logo

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing provides product innovations and cloud solutions, with Annapurna Labs designing silicon and software that accelerates innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Senior Machine Learning Engineer, Bedrock

Senior Machine Learning Engineer position at Amazon's AWS Bedrock team, focusing on GenAI model optimization and inference efficiency.

Senior Machine Learning Engineer, Bedrock

Senior Machine Learning Engineer position at Amazon's Bedrock team, focusing on developing and optimizing GenAI models and inference engines.

Senior Software Development Engineer - GenAI, Amazon Ads - Creative X

Senior Software Engineering role at Amazon Ads focusing on developing AI-based systems for creative content optimization and advertising technology.

ML Compiler Engineer, AWS Neuron, Annapurna Labs

ML Compiler Engineer position at AWS Neuron team, focusing on optimizing deep learning and GenAI workloads for custom ML accelerators through kernel development and compiler optimization.

Software Development Engineer, Finance Intelligence

Senior Software Engineer role at Amazon Finance Technology building AI/ML solutions for financial services and automation.

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing (UC) is at the forefront of cloud innovation, providing groundbreaking products and services that distinguish AWS in the industry. This senior role within the Machine Learning Applications (ML Apps) team for AWS Neuron focuses on developing and optimizing distributed training solutions for cutting-edge AI models.

The position involves working with AWS's custom silicon accelerators (Inferentia and Trainium) and their corresponding server implementations (Trn1 and Inf1). You'll be responsible for creating high-performance distributed training solutions for large-scale language models like GPT-2/3, stable diffusion, and Vision Transformers.

As a senior engineer, you'll collaborate across teams with chip architects, compiler engineers, and runtime specialists to build and enhance distributed training capabilities in major frameworks like PyTorch, TensorFlow, and JAX. The role requires deep expertise in both software development and machine learning, with a focus on distributed training technologies like FSDP and Deepspeed.

The team culture emphasizes knowledge-sharing and mentorship, with senior members actively participating in code reviews and one-on-one mentoring. AWS values diverse experiences and backgrounds, fostering an inclusive environment through employee-led affinity groups and ongoing learning opportunities.

Career growth is strongly supported, with resources for knowledge-sharing and professional development. The company emphasizes work-life harmony and provides comprehensive benefits including competitive base pay, equity compensation, and various medical and financial benefits.

This position offers an opportunity to work on breakthrough AI/ML technologies while being part of Amazon's larger mission to be Earth's Best Employer. The role combines technical leadership with hands-on development, making it ideal for experienced engineers passionate about advancing the field of machine learning infrastructure.

Last updated 18 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training and inference support into PyTorch, TensorFlow, JAX
  • Tune ML models for highest performance on AWS Trainium and Inferentia silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Develop and enable ML model families including GPT2, GPT3, stable diffusion, Vision Transformers

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Bachelor's degree in computer science or equivalent (preferred)
  • Machine Learning knowledge in frameworks and end to end model training

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Medical, financial, and other benefits
  • Equity compensation
  • Sign-on payments
  • Work-life harmony
  • Mentorship and career growth opportunities
  • Inclusive team culture

Interested in this job?