AWS Neuron is seeking a Software Development Engineer to join their Machine Learning Applications (ML Apps) team, focusing on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role is central to developing and optimizing performance for various ML model families, including large language models like Llama2, GPT2, GPT3, as well as stable diffusion and Vision Transformers.
The position involves working closely with compiler and runtime engineers to create distributed inference solutions using Trn1. You'll be responsible for optimizing inference performance for both latency and throughput on large models using Python, PyTorch, and JAX. Experience with Deepspeed and other distributed inference libraries is essential.
As part of a startup-like development environment, you'll build high-impact solutions for a large customer base, participate in design discussions, conduct code reviews, and collaborate with internal and external stakeholders. The team emphasizes knowledge-sharing and mentorship, providing opportunities for career growth through increasingly complex technical challenges.
The role combines deep technical expertise in machine learning systems with practical software engineering, requiring strong skills in C++/Python and comprehensive ML knowledge. You'll be working at the forefront of ML infrastructure, helping to shape the future of cloud-based machine learning acceleration.
Amazon offers a competitive compensation package including base pay ranging from $129,300 to $223,600 depending on location, plus equity, sign-on payments, and comprehensive benefits. Join a team that's dedicated to innovation and technical excellence in the rapidly evolving field of machine learning infrastructure.