Software Development Engineer - AI, Hyperpod Engines

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Development Engineer - AI, Hyperpod Engines

AWS Utility Computing (UC) is at the forefront of cloud innovation, providing foundational services like S3 and EC2. The Hyperpod Engines team is specifically focused on building a resilient platform for deep learning training through Amazon Sagemaker Hyperpod, which scales and accelerates generative AI model development across thousands of AI accelerators.

As a Software Development Engineer in this role, you'll be working with cutting-edge AI technologies, developing training frameworks and communication libraries. You'll be hands-on with frameworks like Pytorch, Nemo, and Megatron, while also working on collective communications libraries such as NCCL. A significant part of your work will involve training and fine-tuning large language models like LLAMA.

The position offers an exciting opportunity to work in a fast-paced, cross-disciplinary environment alongside engineers and researchers who are leaders in the field. You'll tackle challenging problems, develop innovative solutions, and deliver production-ready implementations that directly impact customer-facing products.

Amazon offers a comprehensive benefits package and values work-life harmony. The company is committed to diversity and inclusion, providing various employee-led affinity groups and inclusion events. Career growth opportunities include extensive knowledge-sharing and mentorship programs.

The role is based in Santa Clara, CA, and offers competitive compensation ranging from $129,300 to $223,600 per year, depending on location and experience. This is an excellent opportunity for someone with strong software development skills who wants to work at the intersection of cloud computing and artificial intelligence.

Last updated 14 minutes ago

Responsibilities For Software Development Engineer - AI, Hyperpod Engines

  • Developing training frameworks and communication libraries
  • Working on training frameworks like Pytorch, Nemo, Megatron, and collective communications libraries
  • Developing software to train and fine tune large language models
  • Creating and modifying large or significant set of components
  • Developing model training optimizations like context parallel, pipeline parallel and tensor parallel

Requirements For Software Development Engineer - AI, Hyperpod Engines

Python
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language

Benefits For Software Development Engineer - AI, Hyperpod Engines

Medical Insurance
401k
  • Medical Insurance
  • 401k

Interested in this job?

Jobs Related To Amazon Software Development Engineer - AI, Hyperpod Engines

Software Engineer- AI/ML, AWS Neuron

Software Engineer position at AWS Neuron team focusing on ML infrastructure development and optimization for cloud-scale machine learning accelerators.

Software Development Engineer II, Rufus Engineering

Build scalable AI-powered shopping experiences at Amazon as a Software Development Engineer II, developing conversational capabilities using large language models and distributed systems.

Software Development Eng II, Appstore Quality Tech

Software Development Engineer II position at Amazon's Appstore Quality team, focusing on AI/ML-based automation of app certification processes and quality checks.

SDE II - Perception & Planning, Last Mile Delivery Automation

SDE II position at Amazon's Last Mile Delivery Automation team, focusing on developing autonomous delivery solutions using AI, robotics, and advanced perception systems.

Machine Learning Engineer II, Just Walk Out (JWO)

Machine Learning Engineer II position at Amazon's Just Walk Out team, developing AI models for checkout-free shopping technology using AWS and advanced ML tools.