Software Development Engineer, SageMaker

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Development Engineer, SageMaker

At AWS AI, we're building the next-generation AI platform to accelerate LLM and Generative AI development through Amazon SageMaker. This role focuses on developing distributed machine learning systems and large-scale solutions for our worldwide customer base. You'll be working on the SageMaker HyperPod team, building platform and products for large scale deep learning model training (100+ billion parameter GPT, 1000s of GPU devices).

The position offers an opportunity to work with cutting-edge AI technology and shape the future of machine learning infrastructure. You'll collaborate with ML scientists and customers to influence overall strategy and define the team's roadmap. The role involves designing and implementing robust, scalable solutions while maintaining high engineering standards.

AWS provides a dynamic work environment with emphasis on work-life harmony, offering flexible hybrid work arrangements. The company strongly values diversity and inclusion, demonstrated through employee-led affinity groups and ongoing learning experiences. Career growth is supported through mentorship and knowledge-sharing opportunities.

Key technical aspects include:

  • Building next-generation AI platform using Kubernetes
  • Optimizing distributed training systems
  • Collaborating with leading technology companies and open source communities
  • Working with technologies like PyTorch and NVIDIA/GPU
  • Developing solutions for large-scale model training

The role combines technical leadership with hands-on development, requiring both strong engineering skills and the ability to mentor others. You'll be part of AWS's mission to democratize AI technology while working with some of the most advanced machine learning infrastructure in the industry.

Last updated a day ago

Responsibilities For Software Development Engineer, SageMaker

  • Developing innovative solutions for supporting Large Language Model training in a cluster of nodes
  • Develop and maintain a performant, resilient and fully-managed service for training large-scale foundation models
  • Optimizing distributed training by profiling and addressing bottlenecks
  • Serve as technical lead on complex projects
  • Hire and mentor junior development engineers

Requirements For Software Development Engineer, SageMaker

Go
Kubernetes
  • 3+ years of non-internship professional software development experience
  • 2+ years of design or architecture experience
  • Experience programming with at least one software programming language
  • Experience in multi-threaded asynchronous C++/Go development
  • Experience in resource orchestrators with kubernetes
  • Experience in large language model training

Benefits For Software Development Engineer, SageMaker

Medical Insurance
401k
  • Medical Insurance
  • 401k

Interested in this job?

Jobs Related To Amazon Software Development Engineer, SageMaker

Machine Learning Engineer II, StoreGen

Machine Learning Engineer II position at Amazon's StoreGen team, focusing on AI-powered software development tools and practices with competitive compensation and benefits.

Machine Learning Engineer, Generative AI Innovation Center

Join AWS's Generative AI Innovation Center as a Machine Learning Engineer to develop and optimize custom LLMs, working with enterprise customers to deliver transformative AI solutions.

Amazon Q Delivery Engineer, Amazon Q Customer Success Team (Q-CST)

AWS Delivery Engineer position focusing on implementing Generative AI solutions using Amazon Q and Bedrock, combining technical expertise with customer success.

Machine Learning Engineer II, AWS Just-Walk-Out Science Team

Machine Learning Engineer role at Amazon's AWS Just-Walk-Out team, focusing on computer vision and deep learning for autonomous retail technology.

SDE-II, Alexa Sensitive Content & Intelligence

SDE-II position at Amazon's Alexa team focusing on content intelligence and trust, using AI/ML to protect users from sensitive content across all Alexa interactions.