Taro Logo

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing provides product innovations for cloud services, specializing in silicon and software development for machine learning acceleration.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Senior Machine Learning Engineer, Bedrock

Senior Machine Learning Engineer position at Amazon's AWS Bedrock team, focusing on GenAI model optimization and inference efficiency.

Senior Software Development Engineer - GenAI, Amazon Ads - Creative X

Senior Software Engineering role at Amazon Ads focusing on developing AI-based systems for creative content optimization and advertising technology.

ML Compiler Engineer, AWS Neuron, Annapurna Labs

ML Compiler Engineer position at AWS Neuron team, focusing on optimizing deep learning and GenAI workloads for custom ML accelerators through kernel development and compiler optimization.

Senior Software Development Engineer, AWS Neuron Frameworks

Senior Software Engineer role at AWS Neuron developing PyTorch and JAX framework support for cloud-scale machine learning accelerators, focusing on performance optimization and open-source collaboration.

Senior Systems Engineer - Autonomous Drone Perception, Prime Air

Senior Systems Engineer position at Amazon Prime Air, developing autonomous drone perception systems, combining ML with real-world autonomous systems, base pay $136,100-$235,200.

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing (UC) is at the forefront of cloud innovation, developing cutting-edge solutions through Annapurna Labs. This senior role focuses on the AWS Neuron software stack, working with AWS Inferentia and Trainium cloud-scale machine learning accelerators. The position combines advanced software engineering with machine learning expertise, requiring deep knowledge of distributed training systems and major ML frameworks.

The role involves collaborating with a diverse team of chip architects and engineers to optimize performance for large language models and AI systems. You'll be working on groundbreaking technology that powers AWS's machine learning infrastructure, including work with models like GPT-2, GPT-3, and stable diffusion.

AWS offers a comprehensive benefits package, including competitive salary, equity compensation, and full medical coverage. The team culture emphasizes mentorship, knowledge-sharing, and work-life harmony. You'll be part of an inclusive environment that values diverse experiences and perspectives.

The position provides unique opportunities to work with cutting-edge AI/ML technology while contributing to systems that help customers solve previously impossible challenges. You'll have access to continuous learning opportunities, career development resources, and the chance to work with some of the most advanced cloud computing and AI technologies in the industry.

This role is ideal for experienced engineers who are passionate about machine learning, distributed systems, and high-performance computing. You'll be part of AWS's mission to innovate in the cloud computing space while enjoying the benefits of working for a global technology leader.

Last updated 17 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training and inference support into Pytorch, Tensorflow, Jax
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions
  • Performance tuning of ML model families including GPT2, GPT3, stable diffusion, Vision Transformers
  • Ensure highest performance and maximize efficiency on AWS Trainium and Inferentia silicon

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Bachelor's degree in computer science or equivalent (preferred)
  • Machine Learning knowledge in frameworks and end to end model training

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Full range of medical benefits
  • Financial benefits
  • 401k
  • Equity compensation
  • Sign-on payments
  • Flexible work arrangements
  • Career growth opportunities
  • Mentorship programs

Interested in this job?