Taro Logo

SDE 2, Sagemaker AI Platforms

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing.
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
2+ years of experience
AI · Enterprise SaaS · Cloud

Description For SDE 2, Sagemaker AI Platforms

AWS Utility Computing (UC) is at the forefront of cloud innovation, providing foundational services like S3 and EC2. This role is specifically within the AWS AI organization, focusing on Amazon SageMaker, which aims to simplify deep learning workloads in the cloud. As customers increasingly adopt LLMs and Generative AI, the team is building a next-generation AI platform to accelerate development.

As an SDE 2 on the SageMaker team, you'll be instrumental in designing and developing distributed machine learning systems at scale. You'll work closely with ML scientists and customers to shape strategy and define roadmaps. The role involves building innovative solutions for large language model training, optimizing distributed training performance, and maintaining a fully-managed service for training foundation models.

The position offers unique opportunities to work with cutting-edge AI technologies, collaborate with leading technology companies, and contribute to open-source communities like PyTorch and NVIDIA/GPU. You'll be part of AWS's larger mission to democratize AI and machine learning capabilities for businesses worldwide.

The team culture emphasizes learning, curiosity, and inclusion, with various employee-led affinity groups and ongoing learning experiences. AWS values work-life harmony and provides strong mentorship and career growth opportunities. The role combines technical leadership with hands-on development, making it ideal for engineers passionate about AI/ML infrastructure and distributed systems.

Working at Amazon Web Services means joining the world's most comprehensive cloud platform provider, with opportunities to influence how businesses worldwide adopt and implement AI technologies. The role offers exposure to large-scale systems, cutting-edge AI infrastructure, and the chance to work with a global customer base.

Last updated 10 days ago

Responsibilities For SDE 2, Sagemaker AI Platforms

  • Developing innovative solutions for supporting Large Language Model training in a cluster of nodes
  • Develop and maintain a performant, resilient and fully-managed service built to train large-scale foundation models
  • Optimizing distributed training by profiling, identifying bottlenecks and improving performance
  • Serve as a key technical resource in the full development cycle
  • Own delivery of entire piece of the system and serve as technical lead on complex projects
  • Hire/mentor junior development engineers

Requirements For SDE 2, Sagemaker AI Platforms

Python
Kubernetes
  • 2+ years of non-internship design or architecture experience
  • 3+ years of Video Games Industry experience
  • Experience programming with at least one software programming language
  • Experience with full software development life cycle
  • Bachelor's degree in computer science or equivalent (preferred)

Benefits For SDE 2, Sagemaker AI Platforms

  • Work-life harmony
  • Mentorship opportunities
  • Career growth opportunities
  • Inclusive team culture
  • Employee-led affinity groups

Jobs Related To Amazon SDE 2, Sagemaker AI Platforms