Software Development Engineer III, AWS SageMaker Training

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS · Cloud

Description For Software Development Engineer III, AWS SageMaker Training

AWS Utility Computing (UC) is at the forefront of cloud innovation, providing foundational services like S3 and EC2. Within the AWS AI division, the SageMaker Training team is building cutting-edge services to empower data scientists and software engineers in their deep learning endeavors. As customers rapidly adopt LLMs and Generative AI, the team is developing a next-generation AI platform optimized for large-scale model training.

The role offers an opportunity to work on pioneering technology that impacts AWS's global customer base. You'll be responsible for architecting and building distributed machine learning systems that can handle training of massive models (100+ billion parameter GPT) across thousands of GPU devices. The position combines technical leadership with hands-on development, requiring expertise in high-performance computing, scalable systems, and machine learning infrastructure.

The team collaborates closely with leading technology companies and the open source community, including PyTorch and NVIDIA. You'll work in an entrepreneurial environment that values innovation, ownership, and analytical thinking. The role offers exposure to cutting-edge AI technology while building products that directly impact how companies adopt and implement machine learning at scale.

AWS provides a supportive environment focused on learning and career growth. The company values diverse experiences and perspectives, fostering inclusion through employee-led affinity groups and ongoing learning opportunities. Work-life harmony is emphasized, with flexibility built into the working culture. As part of AWS's mission to be Earth's Best Employer, you'll find extensive resources for knowledge-sharing, mentorship, and professional development.

Last updated 35 minutes ago

Responsibilities For Software Development Engineer III, AWS SageMaker Training

  • Design, develop, test, and deploy distributed machine learning systems
  • Build and improve next-generation AI platform
  • Collaborate with ML scientists and customers to influence overall strategy
  • Drive system architecture and best practices
  • Coach and develop junior engineers
  • Build large scale solutions for worldwide customer base

Requirements For Software Development Engineer III, AWS SageMaker Training

Python
Go
  • 5+ years of programming experience
  • 5+ years of leading design or architecture experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in multi-threaded asynchronous C++ or Go development
  • Experience in resource orchestrators, high performance computing, or large language model training
  • Bachelor's degree in computer science or equivalent (preferred)

Benefits For Software Development Engineer III, AWS SageMaker Training

  • Work-life balance
  • Career development and mentorship opportunities
  • Inclusive team culture
  • Employee-led affinity groups
  • Knowledge-sharing resources

Interested in this job?

Jobs Related To Amazon Software Development Engineer III, AWS SageMaker Training

Senior Software Development Engineer, AWS Neuron Frameworks

Senior Software Engineer role at AWS Neuron developing PyTorch and JAX framework support for cloud-scale machine learning accelerators, focusing on performance optimization and open-source collaboration.

Sr. Software Development Engineer, Demand Science Optimization (DSO)

Senior Software Engineering role at Amazon focusing on machine learning and big data analytics for device demand forecasting and supply chain optimization.

Delivery Consultant - Machine Learning Engineer, WWPS ProServe

Senior ML Engineering role at AWS Professional Services, focusing on implementing machine learning solutions for enterprise customers using AWS cloud services.

Software Development Engineer, Amazon Advertising

Senior Software Engineering role at Amazon Advertising focusing on developing AI-powered chat assistant SpektrBot, requiring expertise in machine learning and software architecture.

Sr. Software Engineer, Machine Learning - Amazon Advertising, DemandTech MLENG - Maestro

Senior Software Engineer position at Amazon Advertising focusing on ML infrastructure development and real-time bidding systems.