AWS Neuron is seeking a Senior Software Engineer to join their Machine Learning Inference Applications team. This role focuses on developing and optimizing core components of Large Language Model (LLM) inference for AWS Inferentia and Trainium cloud-scale machine learning accelerators. The position involves working with cutting-edge LLM technology, including attention mechanisms, MLP, quantization, and speculative decoding.
The successful candidate will collaborate closely with chip architects, compiler engineers, and runtime engineers to maximize performance and accuracy across various models like Llama 3.3 70B, 3.1 405B, DBRX, and Mixtral. The team emphasizes knowledge-sharing and mentorship, providing opportunities for career growth through challenging projects and supportive code reviews.
This role offers competitive compensation ranging from $129,300 to $223,600 based on location and experience, plus additional benefits including equity, sign-on payments, and comprehensive medical coverage. The position is based in Seattle, WA, and requires at least 3 years of professional software development experience with strong fundamentals in machine learning model architecture and optimization.
Amazon's commitment to innovation in AI/ML technology, combined with the team's collaborative culture and focus on personal development, makes this an excellent opportunity for engineers passionate about advancing the field of machine learning inference at scale.