Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, now fully integrated with AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distributed Training team. This role focuses on AWS Neuron, the complete software stack for AWS Trainium and Inferentia cloud-scale Machine Learning accelerators. The position involves working with cutting-edge ML technologies, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers.

The role demands expertise in distributed training libraries such as FSDP, Deepspeed, and Nemo, with a focus on extending these capabilities for Neuron-based systems. You'll collaborate with cross-functional teams, including chip architects and compiler engineers, to optimize performance on AWS custom silicon platforms.

AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. The team emphasizes knowledge-sharing and mentorship, supporting both professional and personal growth. They offer competitive compensation, including equity and comprehensive benefits, reflecting their commitment to being Earth's Best Employer.

The position offers exposure to groundbreaking technology in cloud computing and machine learning, working on products that directly impact AWS's infrastructure. You'll be part of a team that has delivered significant products like AWS Nitro, Graviton, and ML Accelerators, contributing to solutions that help customers tackle previously unimaginable technical challenges.

This role presents an exceptional opportunity for experienced engineers passionate about machine learning and distributed systems to work at the forefront of cloud technology, while enjoying a supportive, inclusive work environment that values work-life harmony and continuous learning.

Last updated a month ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Development, enablement and performance tuning of ML model families

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language
  • 5+ years of leading design or architecture of new and existing systems
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Equity
Mental Health Assistance
  • Medical, financial, and other benefits
  • Equity compensation
  • Sign-on payments
  • Mentorship and career growth opportunities
  • Work-life harmony

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

System Validation Engineer

System Validation Engineer role at Amazon focusing on ML accelerator validation and bringup, requiring 4+ years of experience in systems development and Linux environments.

Software Dev Engineer, Amazon

Senior Software Engineer role at Amazon focusing on AI and machine learning infrastructure development, offering competitive compensation and the opportunity to work on large-scale ML solutions.

Sr. Software Development Engineer, Worldwide Marketplace Science, Prime Video

Senior Software Engineer role at Amazon Prime Video focusing on machine learning and search technologies, offering competitive compensation and the opportunity to shape the future of entertainment.

Sr. Prompt Engineer, Trustworthy Shopping Experience (TSE)

Senior Prompt Engineer role at Amazon focusing on AI-driven solutions for trustworthy shopping experiences, requiring 3+ years of experience in program management and cross-functional collaboration.

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior ML Engineer role at AWS focusing on distributed training systems for large-scale machine learning models, working with custom silicon and ML accelerators.