Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Amazon

Amazon is a global technology company providing cloud computing, e-commerce, AI, and digital streaming services.

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Join AWS Neuron team as a Software Engineer focused on AI/ML distributed training. This role is part of the Machine Learning Applications (ML Apps) team, working on AWS's cloud-scale machine learning accelerators Inferentia and Trainium. You'll be responsible for developing and optimizing distributed training solutions for massive scale language models, vision transformers, and other ML models.

The position is within Annapurna Labs, acquired by AWS in 2015, which serves as AWS's infrastructure provider. You'll work alongside chip architects, compiler engineers, and runtime engineers to create cutting-edge distributed training solutions for Trn2 and Trn1 systems. The role requires expertise in both software development and machine learning, particularly with frameworks like FSDP, Deepspeed, and other distributed training libraries.

AWS offers an inclusive team culture with ten employee-led affinity groups and various learning experiences. The team values work-life balance, offering flexible working hours and supporting professional growth through mentorship and knowledge sharing. You'll be part of a diverse team working on revolutionary cloud infrastructure products that impact millions of users worldwide.

This is an opportunity to work with cutting-edge ML technology, contribute to high-impact projects, and shape the future of cloud-based machine learning infrastructure. The role combines technical depth in ML systems with the scale and impact of AWS's cloud platform, making it ideal for engineers passionate about both software development and machine learning.

Last updated 10 hours ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Build distributed training support into Pytorch, Tensorflow, JAX
Develop and maintain Neuron compiler and runtime stacks
Tune ML models for highest performance
Work with chip architects and compiler engineers
Enable and performance tune various ML model families including LLMs

Requirements For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Python

3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture experience
Experience programming with at least one software programming language
Deep Learning industry experience

Benefits For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Medical Insurance

401k

Work-life balance
Flexible working hours
Mentorship & Career Growth
Medical benefits
401k

Amazon

Amazon is a global technology company providing cloud computing, e-commerce, AI, and digital streaming services.

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Systems Engineer, AI/ML

Amazon

Systems Engineer position at AWS focusing on AI/ML services, combining cloud infrastructure expertise with artificial intelligence systems support.

Software Engineer- AI/ML, AWS Neuron

Amazon

Software Engineer position for AWS Neuron team working on AI/ML infrastructure and distributed training solutions.

Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

Senior Software Engineer position at AWS Neuron focusing on distributed training solutions for machine learning, working with cutting-edge ML accelerators and frameworks.

Software Development Engineer, Ring AI

Amazon

Software Development Engineer position at Ring AI (Amazon) in Iasi, Romania, focusing on computer vision and machine learning infrastructure for smart home security solutions.

Systems Development Engineer, AI/ML

Amazon

Systems Development Engineer position at AWS focusing on AI/ML services, involving cloud infrastructure automation, system operations, and development of large-scale distributed systems.