AWS Utility Computing (UC) is at the forefront of cloud innovation, providing foundational services like S3 and EC2. The Hyperpod Engines team is specifically focused on building a resilient platform for deep learning training through Amazon Sagemaker Hyperpod, which scales and accelerates generative AI model development across thousands of AI accelerators.
As a Software Development Engineer in this role, you'll be working with cutting-edge AI technologies, developing training frameworks and communication libraries. You'll be hands-on with frameworks like Pytorch, Nemo, and Megatron, while also working on collective communications libraries such as NCCL. A significant part of your work will involve training and fine-tuning large language models like LLAMA.
The position offers an exciting opportunity to work in a fast-paced, cross-disciplinary environment alongside engineers and researchers who are leaders in the field. You'll tackle challenging problems, develop innovative solutions, and deliver production-ready implementations that directly impact customer-facing products.
Amazon offers a comprehensive benefits package and values work-life harmony. The company is committed to diversity and inclusion, providing various employee-led affinity groups and inclusion events. Career growth opportunities include extensive knowledge-sharing and mentorship programs.
The role is based in Santa Clara, CA, and offers competitive compensation ranging from $129,300 to $223,600 per year, depending on location and experience. This is an excellent opportunity for someone with strong software development skills who wants to work at the intersection of cloud computing and artificial intelligence.