Software Development Engineer, SageMaker HyperPod Data Plane

Amazon

Amazon is a global technology company and leader in e-commerce, cloud computing, and artificial intelligence.

Santa Clara, CA, USA

$129,300 - $223,600

Backend

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For Software Development Engineer, SageMaker HyperPod Data Plane

At AWS AI, we are building the next-generation AI platform to accelerate customer development in LLMs and Generative AI. This role is part of the Amazon SageMaker team, focused on making deep learning workload training accessible in the cloud. As a Software Development Engineer, you'll be instrumental in designing, developing, and deploying distributed machine learning systems for our global customer base.

The position involves working on cutting-edge technology for large-scale deep learning model training, handling models with 100+ billion parameters and managing thousands of GPU devices. You'll be part of a team that's directly impacting AWS's AI infrastructure and the broader machine learning community.

Key responsibilities include developing innovative solutions for LLM training in clustered environments, optimizing distributed training performance, and serving as a technical lead on complex projects. You'll work with internal teams, technology partners, and the open-source community, particularly with frameworks like PyTorch and NVIDIA/GPU technologies.

The ideal candidate will have strong experience in multi-threaded asynchronous C++/Go development, Kubernetes, high-performance computing, and building scalable systems. You should be comfortable with ambiguity, have strong analytical skills, and thrive in an entrepreneurial environment.

AWS offers a collaborative and inclusive culture with ten employee-led affinity groups across 190 global chapters. The team values work-life balance and provides flexibility in working hours. You'll have opportunities for mentorship and career growth, working alongside experienced professionals in a supportive environment that celebrates knowledge sharing.

This role offers competitive compensation based on location and experience, with additional benefits including equity, sign-on payments, and comprehensive medical and financial benefits. Join us in shaping the future of AI infrastructure and help our customers leverage the power of machine learning at scale.

Last updated 3 minutes ago

Responsibilities For Software Development Engineer, SageMaker HyperPod Data Plane

Developing innovative solutions for supporting Large Language Model training in a cluster of nodes
Develop and maintain a performant, resilient and fully-managed service for training large-scale foundation models
Optimizing distributed training by profiling and identifying bottlenecks
Serve as a key technical resource in the full development cycle
Own delivery of entire piece of the system and serve as technical lead
Hire/mentor junior development engineers

Requirements For Software Development Engineer, SageMaker HyperPod Data Plane

Python

Kubernetes

3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture experience
Experience programming with at least one software programming language
Experience in multi-threaded asynchronous C++/Go development
Experience with kubernetes and high performance computing
Experience in large language model training

Benefits For Software Development Engineer, SageMaker HyperPod Data Plane

Medical Insurance

401k

Work-life balance
Flexible working hours
Mentorship opportunities
Career growth opportunities
Employee-led affinity groups
Comprehensive benefits package

Amazon

Amazon is a global technology company and leader in e-commerce, cloud computing, and artificial intelligence.

Santa Clara, CA, USA

$129,300 - $223,600

Backend

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Amazon Software Development Engineer, SageMaker HyperPod Data Plane

Software Dev Engineer II, Alexa Connect Kit

Amazon

Software Development Engineer II position at Amazon's Alexa Connect Kit team, focusing on IoT and smart home device integration, requiring 3+ years of experience.

Software Development Engineer, FPDS Jobs Domain

Amazon

Software Development Engineer position at Amazon focused on building solutions to revolutionize workforce management systems at scale.

Software Development Engineer, IPP EU

Amazon

Software Development Engineer role at Amazon's Consumer Payments team in Bangalore, focusing on building innovative global payment solutions and installment products.

Software Development Engineer, EU Marketing, EU Marketing

Amazon

Software Development Engineer role at Amazon's London Marketing Tech Hub, focusing on building large-scale digital marketing systems and customer-facing experiences.

Software Development Engineer, Workforce Solutions - Hiring Software Delivery

Amazon

Software Development Engineer role at Amazon building scalable solutions for workforce management, focusing on pay incentives and job mobility for global employees.