Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS infrastructure provider specializing in silicon engineering, hardware design, and ML accelerators

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. This role at Annapurna Labs, an AWS company, involves working on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. The position combines deep software development expertise with machine learning, requiring experience with frameworks like PyTorch/TensorFlow and distributed training libraries.

The role offers an opportunity to work on cutting-edge ML infrastructure, developing solutions for massive scale language models and various ML applications. You'll collaborate with cross-functional teams including chip architects and compiler engineers to optimize performance on AWS's custom silicon.

AWS provides a collaborative environment with strong emphasis on work-life balance and career growth. The team culture promotes diversity and inclusion, with various employee-led affinity groups and ongoing learning experiences. You'll have opportunities for mentorship and knowledge sharing within a team of varied experience levels.

The position offers competitive compensation based on geographic location, plus equity and comprehensive benefits. You'll be part of AWS's mission to revolutionize cloud infrastructure while working on technologies that impact millions of users worldwide. This role is perfect for someone passionate about both software engineering and machine learning, with a desire to work on large-scale distributed systems.

Last updated 6 days ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

Build distributed training support into PyTorch and TensorFlow
Tune ML models for highest performance on AWS Trainium and Inferentia silicon
Work with chip architects, compiler engineers and runtime engineers
Develop and enable ML model families including GPT2, GPT3, stable diffusion, and Vision Transformers
Create and optimize distributed training solutions with Trn1

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python

3+ years of non-internship professional software development experience
3+ years of non-internship design or architecture experience
Experience programming with at least one software programming language
Deep Learning industry experience
Bachelor's degree in computer science or equivalent (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance

Medical Insurance
Work-Life Balance
Mentorship Program
Career Growth Opportunities

Amazon

AWS infrastructure provider specializing in silicon engineering, hardware design, and ML accelerators

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Machine Learning Engineer II, StoreGen

Amazon

Machine Learning Engineer II position at Amazon's StoreGen team, focusing on AI-powered software development tools and practices with competitive compensation and benefits.

Machine Learning Engineer, Generative AI Innovation Center

Amazon

Join AWS's Generative AI Innovation Center as a Machine Learning Engineer to develop and optimize custom LLMs, working with enterprise customers to deliver transformative AI solutions.

Amazon Q Delivery Engineer, Amazon Q Customer Success Team (Q-CST)

Amazon

AWS Delivery Engineer position focusing on implementing Generative AI solutions using Amazon Q and Bedrock, combining technical expertise with customer success.

Machine Learning Engineer II, AWS Just-Walk-Out Science Team

Amazon

Machine Learning Engineer role at Amazon's AWS Just-Walk-Out team, focusing on computer vision and deep learning for autonomous retail technology.

SDE-II, Alexa Sensitive Content & Intelligence

Amazon

SDE-II position at Amazon's Alexa team focusing on content intelligence and trust, using AI/ML to protect users from sensitive content across all Alexa interactions.