Taro Logo

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Multimodal

AWS Utility Computing provides product innovations for cloud services, with Annapurna Labs designing silicon and software that accelerates innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Multimodal

AWS Utility Computing (UC) is at the forefront of cloud innovation, developing and managing critical services across Compute, Database, Storage, and Platform solutions. This senior role is within the Machine Learning Applications (ML Apps) team for AWS Neuron, focusing on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. The position involves working with cutting-edge AI technologies, including large language models like GPT-2/3 and vision transformers.

The role combines deep technical expertise in distributed systems with machine learning, requiring collaboration with chip architects and compiler engineers. You'll be responsible for developing and optimizing distributed training solutions using frameworks like PyTorch, TensorFlow, and JAX, while ensuring maximum performance on AWS's custom silicon.

The team culture emphasizes knowledge-sharing and mentorship, with senior members providing one-on-one guidance and thorough code reviews. AWS values diverse experiences and backgrounds, fostering an inclusive environment through employee-led affinity groups and ongoing learning opportunities.

Working at AWS means joining a team that's pioneering cloud computing innovation. You'll have access to career growth resources, mentorship opportunities, and a strong work-life harmony culture. The position offers competitive compensation, including base pay ranging from $151,300 to $261,500 depending on location, plus potential equity and comprehensive benefits.

This role is perfect for experienced engineers passionate about machine learning infrastructure who want to impact how the world's most advanced AI models are trained and deployed at scale. You'll be at the intersection of hardware and software, working with custom chips while building the future of distributed AI training systems.

Last updated a day ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Multimodal

  • Lead efforts building distributed training and inference support into Pytorch, Tensorflow, Jax
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trn1
  • Performance tuning of ML model families including GPT2, GPT3, stable diffusion, Vision Transformers
  • Ensure highest performance and maximize efficiency on AWS Trainium and Inferentia silicon

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Multimodal

Python
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture of new and existing systems experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Bachelor's degree in computer science or equivalent (preferred)

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Multimodal

Medical Insurance
  • Medical benefits
  • Financial benefits
  • Comprehensive benefits package

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Multimodal