Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS Utility Computing provides product innovations and cloud solutions, with Annapurna Labs designing silicon and software that accelerates innovation.

Seattle, WA, USA • Cupertino, CA, USA

$151,300 - $261,500

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

This job posting may no longer be active. You may be interested in these related jobs instead:

Senior Machine Learning Engineer, Bedrock

Amazon

Senior Machine Learning Engineer position at Amazon's AWS Bedrock team, focusing on GenAI model optimization and inference efficiency.

Senior Machine Learning Engineer, Bedrock

Amazon

Senior Machine Learning Engineer position at Amazon's Bedrock team, focusing on developing and optimizing GenAI models and inference engines.

Senior Software Development Engineer - GenAI, Amazon Ads - Creative X

Amazon

Senior Software Engineering role at Amazon Ads focusing on developing AI-based systems for creative content optimization and advertising technology.

ML Compiler Engineer, AWS Neuron, Annapurna Labs

Amazon

ML Compiler Engineer position at AWS Neuron team, focusing on optimizing deep learning and GenAI workloads for custom ML accelerators through kernel development and compiler optimization.

Software Development Engineer, Finance Intelligence

Amazon

Senior Software Engineer role at Amazon Finance Technology building AI/ML solutions for financial services and automation.

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing (UC) is at the forefront of cloud innovation, providing groundbreaking products and services that distinguish AWS in the industry. This senior role within the Machine Learning Applications (ML Apps) team for AWS Neuron focuses on developing and optimizing distributed training solutions for cutting-edge AI models.

The position involves working with AWS's custom silicon accelerators (Inferentia and Trainium) and their corresponding server implementations (Trn1 and Inf1). You'll be responsible for creating high-performance distributed training solutions for large-scale language models like GPT-2/3, stable diffusion, and Vision Transformers.

As a senior engineer, you'll collaborate across teams with chip architects, compiler engineers, and runtime specialists to build and enhance distributed training capabilities in major frameworks like PyTorch, TensorFlow, and JAX. The role requires deep expertise in both software development and machine learning, with a focus on distributed training technologies like FSDP and Deepspeed.

The team culture emphasizes knowledge-sharing and mentorship, with senior members actively participating in code reviews and one-on-one mentoring. AWS values diverse experiences and backgrounds, fostering an inclusive environment through employee-led affinity groups and ongoing learning opportunities.

Career growth is strongly supported, with resources for knowledge-sharing and professional development. The company emphasizes work-life harmony and provides comprehensive benefits including competitive base pay, equity compensation, and various medical and financial benefits.

This position offers an opportunity to work on breakthrough AI/ML technologies while being part of Amazon's larger mission to be Earth's Best Employer. The role combines technical leadership with hands-on development, making it ideal for experienced engineers passionate about advancing the field of machine learning infrastructure.

Last updated 18 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Lead efforts building distributed training and inference support into PyTorch, TensorFlow, JAX
Tune ML models for highest performance on AWS Trainium and Inferentia silicon
Work with chip architects, compiler engineers and runtime engineers
Develop and enable ML model families including GPT2, GPT3, stable diffusion, Vision Transformers

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language
5+ years of leading design or architecture experience
5+ years of full software development life cycle experience
Experience as a mentor, tech lead or leading an engineering team
Bachelor's degree in computer science or equivalent (preferred)
Machine Learning knowledge in frameworks and end to end model training

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance

401k

Medical, financial, and other benefits
Equity compensation
Sign-on payments
Work-life harmony
Mentorship and career growth opportunities
Inclusive team culture

Amazon

AWS Utility Computing provides product innovations and cloud solutions, with Annapurna Labs designing silicon and software that accelerates innovation.

Seattle, WA, USA • Cupertino, CA, USA

$151,300 - $261,500

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Interested in this job?