Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS infrastructure provider specializing in silicon engineering, hardware design, and ML accelerators
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. This role at Annapurna Labs, an AWS company, involves working on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. The position combines deep software development expertise with machine learning, requiring experience with frameworks like PyTorch/TensorFlow and distributed training libraries.

The role offers an opportunity to work on cutting-edge ML infrastructure, developing solutions for massive scale language models and various ML applications. You'll collaborate with cross-functional teams including chip architects and compiler engineers to optimize performance on AWS's custom silicon.

AWS provides a collaborative environment with strong emphasis on work-life balance and career growth. The team culture promotes diversity and inclusion, with various employee-led affinity groups and ongoing learning experiences. You'll have opportunities for mentorship and knowledge sharing within a team of varied experience levels.

The position offers competitive compensation based on geographic location, plus equity and comprehensive benefits. You'll be part of AWS's mission to revolutionize cloud infrastructure while working on technologies that impact millions of users worldwide. This role is perfect for someone passionate about both software engineering and machine learning, with a desire to work on large-scale distributed systems.

Last updated 6 days ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Build distributed training support into PyTorch and TensorFlow
  • Tune ML models for highest performance on AWS Trainium and Inferentia silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Develop and enable ML model families including GPT2, GPT3, stable diffusion, and Vision Transformers
  • Create and optimize distributed training solutions with Trn1

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Bachelor's degree in computer science or equivalent (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
  • Medical Insurance
  • Work-Life Balance
  • Mentorship Program
  • Career Growth Opportunities

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Machine Learning Engineer II, StoreGen

Machine Learning Engineer II position at Amazon's StoreGen team, focusing on AI-powered software development tools and practices with competitive compensation and benefits.

Machine Learning Engineer, Generative AI Innovation Center

Join AWS's Generative AI Innovation Center as a Machine Learning Engineer to develop and optimize custom LLMs, working with enterprise customers to deliver transformative AI solutions.

Amazon Q Delivery Engineer, Amazon Q Customer Success Team (Q-CST)

AWS Delivery Engineer position focusing on implementing Generative AI solutions using Amazon Q and Bedrock, combining technical expertise with customer success.

Machine Learning Engineer II, AWS Just-Walk-Out Science Team

Machine Learning Engineer role at Amazon's AWS Just-Walk-Out team, focusing on computer vision and deep learning for autonomous retail technology.

SDE-II, Alexa Sensitive Content & Intelligence

SDE-II position at Amazon's Alexa team focusing on content intelligence and trust, using AI/ML to protect users from sensitive content across all Alexa interactions.