Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing division that designs silicon and software to accelerate cloud innovation through custom chips, accelerators, and software stacks.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing (UC) provides product innovations that continue to set AWS's services and features apart in the industry. This senior role is part of the AWS Neuron team, focusing on distributed training for cloud-scale Machine Learning accelerators. The position involves working with AWS Inferentia and Trainium, our custom ML accelerators, developing and optimizing solutions for large-scale ML models including LLMs like GPT and Llama.

The role combines deep software engineering expertise with machine learning knowledge, requiring work with frameworks like PyTorch and TensorFlow, and distributed training libraries such as FSDP and Deepspeed. You'll collaborate with chip architects and compiler engineers to optimize performance on custom silicon.

Annapurna Labs, acquired by AWS in 2015, is fundamental to AWS's infrastructure, delivering products like AWS Nitro, Graviton, and ML Accelerators. The team emphasizes knowledge-sharing, mentorship, and career growth, supporting members through code reviews and development opportunities.

AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. Work-life harmony is prioritized, ensuring success at work doesn't compromise personal life. The position offers comprehensive benefits, equity compensation, and competitive salary based on location and experience.

Key responsibilities include leading distributed training support development, performance tuning of ML models, and working across teams to optimize solutions for AWS's custom silicon. The role requires strong software development skills combined with machine learning expertise, making it ideal for candidates passionate about pushing the boundaries of cloud computing and AI technology.

Last updated 8 hours ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training support into Pytorch, Tensorflow using XLA
  • Develop and manage Neuron compiler and runtime stacks
  • Tune ML models for highest performance on AWS Trainium and Inferentia silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create and optimize distributed training solutions for large-scale ML models

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
Vision Insurance
Dental Insurance
  • Medical, financial, and other benefits
  • Equity compensation
  • Sign-on payments
  • Comprehensive benefits package
  • Career growth opportunities
  • Mentorship programs
  • Work-life harmony

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer, AGI Sensory ASR Inference

Senior Software Engineering role at Amazon's AGI team focusing on high-performance inference software development and AI system optimization.

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Engineer position at AWS focusing on AI/ML distributed training solutions using AWS Neuron technology stack.

Sr. Software Engineer- AI/ML, AWS Neuron Apps

Senior Software Engineering role at AWS focusing on machine learning infrastructure and optimization for cloud-scale ML accelerators.

Sr. Software Engineer- AI/ML, AWS Neuron Apps

Senior Software Engineer position at AWS focusing on AI/ML infrastructure development and optimization, working with cutting-edge machine learning technologies and custom silicon accelerators.

Sr. ML Compiler Engineer - Automated Reasoning Science, Annapurna Labs

Senior ML Compiler Engineer position at Amazon's AWS Neuron Compiler team, focusing on developing deep learning compiler stack and optimization tools for ML accelerators.