AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. You will be responsible for development, enablement, and performance tuning of a wide variety of ML model families, including massive scale large language models like Llama2, GPT2, GPT3, and beyond, as well as stable diffusion, Vision Transformers, and many more.
Key responsibilities include:
The ideal candidate will have strong software development skills using C++/Python and deep ML knowledge. Experience optimizing inference performance for both latency and throughput on large models using Python, PyTorch, or JAX is essential. Familiarity with DeepSpeed and other distributed inference libraries is crucial.
You'll be working in a startup-like development environment, always focusing on the most important tasks. The team is dedicated to supporting new members, with a mix of experience levels and tenures. They celebrate knowledge-sharing and mentorship, with senior members providing one-on-one mentoring and thorough code reviews.
Join AWS Neuron and be at the forefront of cloud-scale machine learning acceleration!