AWS Utility Computing (UC) is at the forefront of cloud innovation, providing foundational services like S3 and EC2. Within the AWS AI division, the SageMaker Training team is building cutting-edge services to empower data scientists and software engineers in their deep learning endeavors. As customers rapidly adopt LLMs and Generative AI, the team is developing a next-generation AI platform optimized for large-scale model training.
The role offers an opportunity to work on pioneering technology that impacts AWS's global customer base. You'll be responsible for architecting and building distributed machine learning systems that can handle training of massive models (100+ billion parameter GPT) across thousands of GPU devices. The position combines technical leadership with hands-on development, requiring expertise in high-performance computing, scalable systems, and machine learning infrastructure.
The team collaborates closely with leading technology companies and the open source community, including PyTorch and NVIDIA. You'll work in an entrepreneurial environment that values innovation, ownership, and analytical thinking. The role offers exposure to cutting-edge AI technology while building products that directly impact how companies adopt and implement machine learning at scale.
AWS provides a supportive environment focused on learning and career growth. The company values diverse experiences and perspectives, fostering inclusion through employee-led affinity groups and ongoing learning opportunities. Work-life harmony is emphasized, with flexibility built into the working culture. As part of AWS's mission to be Earth's Best Employer, you'll find extensive resources for knowledge-sharing, mentorship, and professional development.