Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS Utility Computing division that designs silicon and software to accelerate cloud innovation through custom chips, accelerators, and software stacks.

Seattle, WA, USA • Cupertino, CA, USA

$151,300 - $261,500

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing (UC) provides product innovations that continue to set AWS's services and features apart in the industry. This senior role is part of the AWS Neuron team, focusing on distributed training for cloud-scale Machine Learning accelerators. The position involves working with AWS Inferentia and Trainium, our custom ML accelerators, developing and optimizing solutions for large-scale ML models including LLMs like GPT and Llama.

The role combines deep software engineering expertise with machine learning knowledge, requiring work with frameworks like PyTorch and TensorFlow, and distributed training libraries such as FSDP and Deepspeed. You'll collaborate with chip architects and compiler engineers to optimize performance on custom silicon.

Annapurna Labs, acquired by AWS in 2015, is fundamental to AWS's infrastructure, delivering products like AWS Nitro, Graviton, and ML Accelerators. The team emphasizes knowledge-sharing, mentorship, and career growth, supporting members through code reviews and development opportunities.

AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. Work-life harmony is prioritized, ensuring success at work doesn't compromise personal life. The position offers comprehensive benefits, equity compensation, and competitive salary based on location and experience.

Key responsibilities include leading distributed training support development, performance tuning of ML models, and working across teams to optimize solutions for AWS's custom silicon. The role requires strong software development skills combined with machine learning expertise, making it ideal for candidates passionate about pushing the boundaries of cloud computing and AI technology.

Last updated 8 hours ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Lead efforts building distributed training support into Pytorch, Tensorflow using XLA
Develop and manage Neuron compiler and runtime stacks
Tune ML models for highest performance on AWS Trainium and Inferentia silicon
Work with chip architects, compiler engineers and runtime engineers
Create and optimize distributed training solutions for large-scale ML models

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python

Java

Bachelor's degree in computer science or equivalent
5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language
5+ years of leading design or architecture experience
5+ years of full software development life cycle experience
Experience as a mentor, tech lead or leading an engineering team
Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance

401k

Vision Insurance

Dental Insurance

Medical, financial, and other benefits
Equity compensation
Sign-on payments
Comprehensive benefits package
Career growth opportunities
Mentorship programs
Work-life harmony

Amazon

AWS Utility Computing division that designs silicon and software to accelerate cloud innovation through custom chips, accelerators, and software stacks.

Seattle, WA, USA • Cupertino, CA, USA

$151,300 - $261,500

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer, AGI Sensory ASR Inference

Amazon

Senior Software Engineering role at Amazon's AGI team focusing on high-performance inference software development and AI system optimization.

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

Senior Software Engineer position at AWS focusing on AI/ML distributed training solutions using AWS Neuron technology stack.

Sr. Software Engineer- AI/ML, AWS Neuron Apps

Amazon

Senior Software Engineering role at AWS focusing on machine learning infrastructure and optimization for cloud-scale ML accelerators.

Sr. Software Engineer- AI/ML, AWS Neuron Apps

Amazon

Senior Software Engineer position at AWS focusing on AI/ML infrastructure development and optimization, working with cutting-edge machine learning technologies and custom silicon accelerators.

Sr. ML Compiler Engineer - Automated Reasoning Science, Annapurna Labs

Amazon

Senior ML Compiler Engineer position at Amazon's AWS Neuron Compiler team, focusing on developing deep learning compiler stack and optimization tools for ML accelerators.