Taro Logo

Software Development Engineer II, AWS SageMaker Training

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing.
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS · Cloud

Description For Software Development Engineer II, AWS SageMaker Training

AWS Utility Computing (UC) is at the forefront of cloud innovation, providing foundational services like Amazon S3 and EC2. The AWS SageMaker Training team is building cutting-edge services to empower data scientists and software engineers in their deep learning endeavors. As customers increasingly adopt LLMs and Generative AI, we're developing a next-generation AI platform optimized for large-scale model training.

As a Software Development Engineer II, you'll join a dynamic team focused on building distributed machine learning systems that operate at massive scale. You'll work with cutting-edge technology, handling training for models with 100+ billion parameters across thousands of GPU devices. The role combines innovative research with practical implementation, requiring expertise in high-performance computing and scalable systems architecture.

The position offers unique opportunities to collaborate with leading technology companies and the open-source community, including PyTorch and NVIDIA. You'll be instrumental in designing and implementing solutions that help customers leverage the power of AI and machine learning at scale. The team values innovation, technical excellence, and the ability to deliver results in a fast-paced environment.

AWS provides a collaborative environment where you can grow professionally through mentorship, knowledge-sharing, and career advancement resources. The company emphasizes work-life harmony and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. You'll be part of a diverse team that's committed to becoming Earth's Best Employer while pushing the boundaries of what's possible in cloud computing and AI.

This role is perfect for someone who is passionate about building large-scale AI infrastructure, has strong technical abilities, and wants to make a significant impact on how the world uses machine learning technology. You'll have the opportunity to work on challenging problems, influence the direction of AWS's AI platform, and help customers achieve their machine learning goals.

Last updated an hour ago

Responsibilities For Software Development Engineer II, AWS SageMaker Training

  • Design, develop, test and deploy distributed machine learning systems
  • Build and improve next-generation AI platform
  • Collaborate with ML scientists and customers to influence overall strategy
  • Drive system architecture and best practices
  • Coach and develop junior engineers
  • Build scalable solutions for worldwide customer base

Requirements For Software Development Engineer II, AWS SageMaker Training

Python
Go
Kubernetes
  • 3+ years of non-internship professional software development experience
  • 2+ years of design/architecture experience
  • Experience with at least one programming language
  • Experience in multi-threaded asynchronous C++ or Go development
  • Experience with resource orchestrators, high performance computing, or large language model training
  • Experience with full software development lifecycle

Benefits For Software Development Engineer II, AWS SageMaker Training

Medical Insurance
Dental Insurance
Vision Insurance
  • Work-life balance
  • Career development opportunities
  • Mentorship programs
  • Inclusive work culture
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Software Development Engineer II, AWS SageMaker Training