AWS Neuron is seeking a Senior Software Engineer to join their Machine Learning Inference Applications team, working on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role offers an exciting opportunity to work at the cutting edge of LLM optimization and inference.
The position involves developing and optimizing core components of Large Language Model inference, including Attention mechanisms, MLP networks, Quantization techniques, Speculative Decoding, and Mixture of Experts. You'll work directly with massive models like Llama 3.3 70B, 3.1 405B, DBRX, and Mixtral, ensuring optimal performance and accuracy on Neuron devices.
What makes this role unique is the close collaboration with chip architects, compiler engineers, and runtime engineers, allowing you to influence the entire stack from hardware to software. The team culture strongly emphasizes knowledge-sharing and mentorship, with senior members providing one-on-one mentoring and thorough code reviews.
The role requires strong software development skills with at least 3 years of professional experience, deep understanding of machine learning fundamentals, and hands-on experience with model optimization. Experience with PyTorch or Jax, particularly in deploying LLMs in production environments, is highly valued.
Amazon offers competitive compensation ranging from $129,300 to $223,600 based on location and experience, plus equity and comprehensive benefits. The position is based in Seattle, WA, offering the opportunity to work with one of the world's leading cloud providers in a rapidly evolving field of AI/ML infrastructure.
This is an excellent opportunity for engineers passionate about machine learning optimization who want to work on cutting-edge technology that powers some of the most advanced AI models in production today.