Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS infrastructure provider specializing in silicon engineering, hardware design, software, and operations.

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. The role involves working with AWS's innovative ML accelerators - Inferentia and Trainium - and their corresponding servers (Trn1 and Inf1). You'll be responsible for developing and optimizing distributed training support for various ML models, including large language models like GPT-2/3 and vision transformers.

The position sits within Annapurna Labs, an AWS infrastructure provider acquired in 2015, which has delivered numerous successful products including AWS Nitro, ENA, EFA, Graviton, and F1 EC2 Instances. You'll work alongside chip architects, compiler engineers, and runtime engineers to create and optimize distributed training solutions using technologies like FSDP and Deepspeed.

The role combines deep technical expertise in both software development and machine learning, with a focus on performance optimization and scalability. You'll be part of a team that values work-life balance, mentorship, and career growth, with opportunities to work on cutting-edge ML infrastructure that impacts millions of users worldwide.

Amazon offers a comprehensive benefits package and a culture that embraces diversity through various employee-led affinity groups. The company's 16 Leadership Principles emphasize seeking diverse perspectives, continuous learning, and earning trust. The team supports flexible working hours and maintains a balanced approach to professional and personal life.

This is an excellent opportunity for someone passionate about ML infrastructure, distributed systems, and high-performance computing, with the chance to work on technology that powers some of the most advanced ML applications in the cloud computing industry.

Last updated 2 days ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

Build distributed training support into PyTorch and TensorFlow using XLA
Develop and tune ML models for highest performance on AWS Trainium and Inferentia silicon
Work with chip architects, compiler engineers and runtime engineers
Create and optimize distributed training solutions with Trn1
Enable and performance tune various ML model families including LLMs and vision models

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python

3+ years of non-internship professional software development experience
3+ years of system design and architecture experience
Experience programming with at least one software programming language
Deep Learning industry experience
Experience with full software development life cycle
Bachelor's degree in computer science or equivalent (preferred)
Experience with PyTorch/JAX/TensorFlow (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance

Mental Health Assistance

Work-life balance
Mentorship opportunities
Career growth opportunities
Medical benefits
Employee-led affinity groups
Flexible working hours

Amazon

AWS infrastructure provider specializing in silicon engineering, hardware design, software, and operations.

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer II, Amazon

Amazon

Amazon SDE II role focusing on AWS and ML technologies to build customer-centric solutions for Private Brands, offering competitive compensation and growth opportunities.

Systems Engineer, AI/ML

Amazon

Systems Engineer position at AWS focusing on AI/ML services, combining cloud infrastructure expertise with artificial intelligence systems support.

Software Engineer- AI/ML, AWS Neuron

Amazon

Software Engineer position for AWS Neuron team working on AI/ML infrastructure and distributed training solutions.

Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

Senior Software Engineer position at AWS Neuron focusing on distributed training solutions for machine learning, working with cutting-edge ML accelerators and frameworks.

Software Development Engineer, Ring AI

Amazon

Software Development Engineer position at Ring AI (Amazon) in Iasi, Romania, focusing on computer vision and machine learning infrastructure for smart home security solutions.