AWS Utility Computing (UC) is at the forefront of cloud innovation, developing and managing critical services across Compute, Database, Storage, and Platform solutions. This senior role is within the Machine Learning Applications (ML Apps) team for AWS Neuron, focusing on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. The position involves working with cutting-edge AI technologies, including large language models like GPT-2/3 and vision transformers.
The role combines deep technical expertise in distributed systems with machine learning, requiring collaboration with chip architects and compiler engineers. You'll be responsible for developing and optimizing distributed training solutions using frameworks like PyTorch, TensorFlow, and JAX, while ensuring maximum performance on AWS's custom silicon.
The team culture emphasizes knowledge-sharing and mentorship, with senior members providing one-on-one guidance and thorough code reviews. AWS values diverse experiences and backgrounds, fostering an inclusive environment through employee-led affinity groups and ongoing learning opportunities.
Working at AWS means joining a team that's pioneering cloud computing innovation. You'll have access to career growth resources, mentorship opportunities, and a strong work-life harmony culture. The position offers competitive compensation, including base pay ranging from $151,300 to $261,500 depending on location, plus potential equity and comprehensive benefits.
This role is perfect for experienced engineers passionate about machine learning infrastructure who want to impact how the world's most advanced AI models are trained and deployed at scale. You'll be at the intersection of hardware and software, working with custom chips while building the future of distributed AI training systems.