Taro Logo

Sr Software Dev Engineer, Edge AI ML Platform (Level 6), Edge AI

Amazon is a global technology company that builds devices and AI capabilities through Lab126.
$151,300 - $261,500
Machine Learning
Staff Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Job Description

Join the Edge AI team at Amazon Devices (Lab126) to architect and implement cutting-edge distributed training systems for large language models. As a Sr Software Dev Engineer, you'll be responsible for building infrastructure that trains models up to 400B parameters and enables their efficient deployment on edge devices. The role combines expertise in distributed systems, machine learning, and performance optimization.

You'll work on scaling training across GPU clusters, implementing advanced parallelism strategies, and developing novel compression techniques. The position requires collaboration with ML scientists to optimize training pipelines and ensure efficient model deployment on resource-constrained devices.

The Edge AI team at Lab126 is dedicated to developing next-generation AI capabilities for Amazon devices. We focus on the complete AI pipeline - from large-scale training to edge deployment - while maintaining privacy and optimizing for resource constraints. Our collaborative environment values technical expertise and practical problem-solving, tackling challenges that push the boundaries of what's possible in edge AI.

Key responsibilities include designing high-performance training systems, implementing memory optimization techniques, and creating evaluation frameworks for compressed models. You'll work with state-of-the-art ML frameworks, optimize GPU utilization, and develop infrastructure that bridges the gap between massive-scale training and edge deployment.

The role offers competitive compensation ranging from $151,300 to $261,500 per year based on location, plus equity and comprehensive benefits. Join us in revolutionizing how AI runs on edge devices while working with a diverse team of engineers and scientists at the forefront of AI innovation.

Last updated a month ago

Responsibilities For Sr Software Dev Engineer, Edge AI ML Platform (Level 6), Edge AI

  • Architect and implement distributed training systems that scale across hundreds or thousands of GPUs
  • Design and optimize data parallelism, tensor parallelism, and pipeline parallelism strategies for large language models
  • Implement memory optimization techniques like activation recomputation, ZeRO, and mixed precision training
  • Develop infrastructure that supports novel distillation and compression techniques for edge deployment
  • Create evaluation frameworks to measure performance of compressed models on target edge hardware
  • Collaborate with ML scientists to optimize training for downstream compression requirements
  • Benchmark and profile training configurations to maximize throughput and GPU utilization
  • Build pipelines that connect large-scale training to edge model deployment workflows

Requirements For Sr Software Dev Engineer, Edge AI ML Platform (Level 6), Edge AI

Python
Kubernetes
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture of new and existing systems experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience with distributed systems or high-performance computing
  • Proficiency in Python and at least one systems programming language (C++, Rust, etc.)
  • Experience with machine learning frameworks such as PyTorch or TensorFlow
  • Understanding of GPU programming and optimization techniques

Benefits For Sr Software Dev Engineer, Edge AI ML Platform (Level 6), Edge AI

Medical Insurance
401k
  • Medical Insurance
  • 401k

Related Jobs