Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing provides product innovations and cloud services including S3, EC2, and other foundational AWS services.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Senior Software Engineer to join their Machine Learning Applications (ML Apps) team, focusing on distributed training solutions. This role combines deep software engineering expertise with machine learning knowledge to develop and optimize ML frameworks for AWS's custom silicon. You'll work on AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators.

The position involves working with cutting-edge ML models including large language models like GPT-2/3, stable diffusion, and Vision Transformers. You'll collaborate with chip architects and engineers to build distributed training solutions using technologies like FSDP and Deepspeed. The role requires expertise in both software development and machine learning, particularly in Python-based frameworks.

As part of AWS Utility Computing, you'll contribute to foundational services that power cloud computing worldwide. The team culture emphasizes learning, diversity, and work-life harmony. Amazon offers comprehensive benefits, mentorship opportunities, and strong career growth potential.

Key responsibilities include implementing distributed training support across major ML frameworks, optimizing model performance on custom silicon, and leading technical initiatives. The ideal candidate brings 5+ years of software development experience, strong ML knowledge, and leadership experience.

This role offers the opportunity to work on next-generation AI infrastructure at scale, with competitive compensation ranging from $151,300 to $261,500 based on location, plus equity and comprehensive benefits. Join us in shaping the future of machine learning infrastructure at AWS.

Last updated 13 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training and inference support into Pytorch, Tensorflow, Jax
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions
  • Performance tuning of ML model families including GPT2, GPT3, stable diffusion, Vision Transformers
  • Ensure highest performance and maximize efficiency on AWS Trainium and Inferentia silicon

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Bachelor's degree in computer science or equivalent (preferred)
  • Machine Learning knowledge in frameworks and end to end model training

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
Vision Insurance
Dental Insurance
Parental Leave
  • Work-life harmony
  • Mentorship opportunities
  • Career growth resources
  • Comprehensive benefits package

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Development Engineer, AWS Neuron Frameworks

Senior Software Engineer role at AWS Neuron developing PyTorch and JAX framework support for cloud-scale machine learning accelerators, focusing on performance optimization and open-source collaboration.

Software Development Engineer III, AWS SageMaker Training

Senior Software Engineer role at AWS building next-generation AI platform for large-scale machine learning model training, requiring 5+ years of experience in distributed systems and ML infrastructure.

Sr. Software Development Engineer, Demand Science Optimization (DSO)

Senior Software Engineering role at Amazon focusing on machine learning and big data analytics for device demand forecasting and supply chain optimization.

Delivery Consultant - Machine Learning Engineer, WWPS ProServe

Senior ML Engineering role at AWS Professional Services, focusing on implementing machine learning solutions for enterprise customers using AWS cloud services.

Software Development Engineer, Amazon Advertising

Senior Software Engineering role at Amazon Advertising focusing on developing AI-powered chat assistant SpektrBot, requiring expertise in machine learning and software architecture.