Sr. Software Development Engineer, ML Infrastructure Team

AWS subsidiary that builds software and hardware for machine learning on EC2
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Development Engineer, ML Infrastructure Team

Join AWS's Machine Learning Infrastructure team at Annapurna Labs, where you'll lead the development of critical tools and infrastructure supporting AWS ML and High Performance Computing technologies. As a Senior Software Development Engineer, you'll be responsible for building and maintaining the infrastructure that monitors and reports on functionality and performance of massive testing workloads at scale. The role combines expertise in CI/CD automation, ML/HPC benchmarks, and cloud infrastructure to support cutting-edge AWS offerings including Trainium, Graviton, and Elastic Fabric Adapter (EFA).

You'll work with technologies like Typescript, CDK, SLURM, and Active Directory to create efficient, cost-effective cluster management solutions. The position requires strong technical leadership, with opportunities to mentor other engineers and communicate effectively with stakeholders. You'll be part of Annapurna Labs, an AWS subsidiary focused on building innovative software and hardware solutions that make ML on EC2 more effective.

The team's mission is to make AWS the most cost-effective platform for AI at scale. You'll join a culture that values diversity, work-life harmony, and continuous learning. The role offers comprehensive benefits, career growth opportunities, and the chance to work with cutting-edge ML and cloud technologies. This position is perfect for experienced engineers who want to impact the future of cloud-based machine learning infrastructure while working with a dedicated team of innovators.

Last updated 5 hours ago

Responsibilities For Sr. Software Development Engineer, ML Infrastructure Team

  • Lead engineering team building and maintaining infrastructure for monitoring testing workloads
  • Automate software delivery using CI/CD tools and AWS products
  • Develop Python code for managing large clusters and ML/HPC workloads
  • Create performance monitoring dashboards using AWS Managed Grafana and Athena
  • Implement automatic regression detection mechanisms
  • Manage complex infrastructure across multiple instance types and software stacks

Requirements For Sr. Software Development Engineer, ML Infrastructure Team

Python
TypeScript
Linux
Kubernetes
  • 5+ years of professional software development experience
  • 5+ years of system design and architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor or tech lead
  • 5+ years coding in Python, Typescript, CDK
  • Experience with CI/CD pipelines (Jenkins preferred)
  • Proficiency with Linux and Containers
  • Experience with Clustered ML or HPC Applications

Benefits For Sr. Software Development Engineer, ML Infrastructure Team

Medical Insurance
401k
Parental Leave
  • Comprehensive medical benefits
  • Work-life harmony focus
  • Career development and mentorship opportunities
  • Employee-led affinity groups
  • Ongoing learning experiences

Interested in this job?

Jobs Related To Amazon Sr. Software Development Engineer, ML Infrastructure Team

Software Development Engineer, AGI Sensory ASR Inference

Senior Software Engineering role at Amazon's AGI team focusing on high-performance inference software development and AI system optimization.

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Engineer position at AWS focusing on AI/ML distributed training solutions using AWS Neuron technology stack.

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Engineer position for AWS Neuron Distributed Training team, focusing on AI/ML development for cloud-scale Machine Learning accelerators.

Sr. Software Engineer- AI/ML, AWS Neuron Apps

Senior Software Engineering role at AWS focusing on machine learning infrastructure and optimization for cloud-scale ML accelerators.

Sr. Software Engineer- AI/ML, AWS Neuron Apps

Senior Software Engineer position at AWS focusing on AI/ML infrastructure development and optimization, working with cutting-edge machine learning technologies and custom silicon accelerators.