AWS Neuron is seeking a Machine Learning Engineer to join their ML Applications team, focusing on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role combines software engineering expertise with machine learning specialization, working on cutting-edge technology that powers AWS's ML infrastructure.
The position involves developing and optimizing various ML model families, including large language models like Llama2, GPT2, GPT3, as well as stable diffusion and Vision Transformers. You'll work closely with compiler and runtime engineers to create distributed inference solutions for Trn1 systems. The role requires strong programming skills in Python/C++ and deep understanding of ML optimization techniques.
As part of the ML Apps team, you'll be responsible for building distributed inference support into frameworks like PyTorch and TensorFlow, while ensuring optimal performance on AWS Trainium and Inferentia silicon. The team operates in a startup-like environment, focusing on high-impact projects that directly affect AWS's large customer base.
The role offers competitive compensation ranging from $129,300 to $223,600 based on location and experience, plus additional benefits including equity and comprehensive medical coverage. You'll be part of a collaborative team that values knowledge-sharing and mentorship, with opportunities for career growth and skill development.
Key responsibilities include performance tuning, architecture design, code reviews, and cross-functional collaboration. The ideal candidate will have 3+ years of software development experience, strong ML knowledge, and expertise in distributed systems and optimization techniques. This is an excellent opportunity for someone passionate about machine learning infrastructure and interested in working with cutting-edge ML accelerator technology.