Software Development Engineer, ML Infrastructure Team

A subsidiary in AWS that builds software and hardware for ML on EC2, focusing on innovations in networks, silicon, and software suites.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Development Engineer, ML Infrastructure Team

Join the Machine Learning (ML) Infrastructure team at AWS as a Software Development Engineer, where you'll be at the forefront of building tools that ensure peak performance of AWS ML and High Performance Computing (HPC) technologies. This role is part of Annapurna Labs, a key AWS subsidiary that develops cutting-edge software and hardware solutions for ML on EC2.

As a member of our team, you'll work on critical infrastructure that monitors and optimizes massive testing workloads at scale. Your responsibilities will include developing automated CI/CD pipelines, managing large-scale clusters, and creating sophisticated monitoring systems using AWS Managed Grafana and Athena. You'll be instrumental in building solutions that help detect and prevent performance regressions before they impact customers.

The position requires expertise in Python, TypeScript, and infrastructure as code (IaC) using CDK. You'll work with advanced technologies including SLURM, Active Directory, and various AWS services. The role offers competitive compensation ranging from $129,300 to $223,600 based on location and experience, plus comprehensive benefits.

This is an excellent opportunity for engineers passionate about ML infrastructure who want to make a significant impact on AWS's ML and HPC capabilities. You'll be working with innovative technologies like Trainium, Neuron, and Elastic Fabric Adapter (EFA), helping to make AWS the premier platform for AI workloads at scale.

The ideal candidate brings strong experience in software development, CI/CD automation, and ML/HPC systems. You'll need to be comfortable managing complex infrastructure across multiple instance types and operating systems, while maintaining high standards for code quality and automation. Join us in shaping the future of cloud-based machine learning infrastructure at AWS.

Last updated 2 minutes ago

Responsibilities For Software Development Engineer, ML Infrastructure Team

  • Build and maintain infrastructure that monitors and reports on functionality and performance of testing workloads
  • Use CI/CD tools and AWS products to automate software delivery
  • Write Python code for managing large clusters and running ML/HPC workloads
  • Create dashboards using AWS Managed Grafana and Athena
  • Develop automatic mechanisms for detecting functional and performance regressions
  • Manage complex infrastructure across multiple instance types and software stacks

Requirements For Software Development Engineer, ML Infrastructure Team

Python
Linux
TypeScript
  • 3+ years of non-internship professional software development experience
  • 2+ years of system design/architecture experience
  • Experience with at least one programming language
  • Experience developing CI/CD pipelines (Jenkins preferred)
  • Proficiency with Linux and Containers
  • Experience with Clustered ML or HPC Applications
  • Experience coding in Python, Typescript, CDK
  • Experience creating automated dashboards and visualization

Benefits For Software Development Engineer, ML Infrastructure Team

Medical Insurance
401k
  • Medical Insurance
  • 401k

Interested in this job?

Jobs Related To Annapurna Labs (U.S.) Inc. Software Development Engineer, ML Infrastructure Team

Software Engineer / SDE II, Amazon

Software Engineer II position at Amazon Advertising, focusing on building AI-powered targeting systems for Sponsored Products.

Software Development Engineer | Amazon Delivery Experience (DEX), DEX AI Regionalization Engineering

Senior Software Engineering role at Amazon's DEX team focusing on ML-based delivery experience optimization, offering competitive pay and benefits.

Software Development Engineer II - DSO, Demand Science Optimization (DSO)

Software Development Engineer II position at Amazon's DSO team, focusing on ML-driven demand forecasting and supply management for Amazon Devices.

Software Development Engineer | Amazon Delivery Experience (DEX) | Machine Learning, DEX AI Regionalization Engineering

Software Development Engineer position at Amazon's Delivery Experience team, focusing on ML-based solutions for delivery optimization.

Software Development Engineer II, ML_AI

AWS SDE II role focusing on building next-gen AI platform for large-scale deep learning, working with LLMs and distributed systems at Amazon's cloud division.