AWS Neuron is seeking a Senior Software Development Engineer to join their Machine Learning Inference Model Enablement team. This role focuses on developing and optimizing large-scale machine learning models, particularly LLMs like the Llama family and DeepSeek, for AWS's cloud infrastructure.
The position involves working with AWS's Inferentia and Trainium cloud-scale machine learning accelerators, requiring expertise in both software development and machine learning. You'll collaborate closely with compiler and runtime engineers to create distributed inference solutions, focusing on performance optimization for both latency and throughput.
As a senior engineer, you'll lead initiatives to build distributed inference support for PyTorch in the Neuron SDK, working in a startup-like environment where impact and innovation are prioritized. The role demands strong Python programming skills combined with deep ML knowledge, particularly in optimizing large language models.
The team offers a supportive environment that values knowledge-sharing and mentorship, with opportunities for career growth through increasingly complex technical challenges. You'll be working at the intersection of cutting-edge ML technology and cloud computing, helping shape the future of AI infrastructure at AWS.
Key responsibilities include performance tuning of various model families, extending distributed inference techniques, and ensuring optimal efficiency of models running on AWS silicon. You'll also participate in design discussions, code reviews, and cross-functional collaboration to drive technical decisions that impact AWS's large customer base.
This position offers competitive compensation ranging from $151,300 to $261,500 per year, depending on location, plus additional benefits including equity and comprehensive healthcare. Join Amazon's AI/ML team to work on technology that's transforming how businesses leverage machine learning at scale.