Join the Edge AI team at Amazon Devices (Lab126) to architect and implement cutting-edge distributed training systems for large language models. As a Sr Software Dev Engineer, you'll be responsible for building infrastructure that trains models up to 400B parameters and enables their efficient deployment on edge devices. The role combines expertise in distributed systems, machine learning, and performance optimization.
You'll work on scaling training across GPU clusters, implementing advanced parallelism strategies, and developing novel compression techniques. The position requires collaboration with ML scientists to optimize training pipelines and ensure efficient model deployment on resource-constrained devices.
The Edge AI team at Lab126 is dedicated to developing next-generation AI capabilities for Amazon devices. We focus on the complete AI pipeline - from large-scale training to edge deployment - while maintaining privacy and optimizing for resource constraints. Our collaborative environment values technical expertise and practical problem-solving, tackling challenges that push the boundaries of what's possible in edge AI.
Key responsibilities include designing high-performance training systems, implementing memory optimization techniques, and creating evaluation frameworks for compressed models. You'll work with state-of-the-art ML frameworks, optimize GPU utilization, and develop infrastructure that bridges the gap between massive-scale training and edge deployment.
The role offers competitive compensation ranging from $151,300 to $261,500 per year based on location, plus equity and comprehensive benefits. Join us in revolutionizing how AI runs on edge devices while working with a diverse team of engineers and scientists at the forefront of AI innovation.