AWS Utility Computing (UC) is at the forefront of cloud innovation, providing foundational services like Amazon S3 and EC2. The AWS SageMaker Training team is building cutting-edge services to empower data scientists and software engineers in their deep learning endeavors. As customers increasingly adopt LLMs and Generative AI, we're developing a next-generation AI platform optimized for large-scale model training.
As a Software Development Engineer II, you'll join a dynamic team focused on building distributed machine learning systems that operate at massive scale. You'll work with cutting-edge technology, handling training for models with 100+ billion parameters across thousands of GPU devices. The role combines innovative research with practical implementation, requiring expertise in high-performance computing and scalable systems architecture.
The position offers unique opportunities to collaborate with leading technology companies and the open-source community, including PyTorch and NVIDIA. You'll be instrumental in designing and implementing solutions that help customers leverage the power of AI and machine learning at scale. The team values innovation, technical excellence, and the ability to deliver results in a fast-paced environment.
AWS provides a collaborative environment where you can grow professionally through mentorship, knowledge-sharing, and career advancement resources. The company emphasizes work-life harmony and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. You'll be part of a diverse team that's committed to becoming Earth's Best Employer while pushing the boundaries of what's possible in cloud computing and AI.
This role is perfect for someone who is passionate about building large-scale AI infrastructure, has strong technical abilities, and wants to make a significant impact on how the world uses machine learning technology. You'll have the opportunity to work on challenging problems, influence the direction of AWS's AI platform, and help customers achieve their machine learning goals.