Software Development Engineer, SageMaker HyperPod Data Plane

Amazon is a global technology company and leader in e-commerce, cloud computing, and artificial intelligence.
$129,300 - $223,600
Backend
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Development Engineer, SageMaker HyperPod Data Plane

At AWS AI, we are building the next-generation AI platform to accelerate customer development in LLMs and Generative AI. This role is part of the Amazon SageMaker team, focused on making deep learning workload training accessible in the cloud. As a Software Development Engineer, you'll be instrumental in designing, developing, and deploying distributed machine learning systems for our global customer base.

The position involves working on cutting-edge technology for large-scale deep learning model training, handling models with 100+ billion parameters and managing thousands of GPU devices. You'll be part of a team that's directly impacting AWS's AI infrastructure and the broader machine learning community.

Key responsibilities include developing innovative solutions for LLM training in clustered environments, optimizing distributed training performance, and serving as a technical lead on complex projects. You'll work with internal teams, technology partners, and the open-source community, particularly with frameworks like PyTorch and NVIDIA/GPU technologies.

The ideal candidate will have strong experience in multi-threaded asynchronous C++/Go development, Kubernetes, high-performance computing, and building scalable systems. You should be comfortable with ambiguity, have strong analytical skills, and thrive in an entrepreneurial environment.

AWS offers a collaborative and inclusive culture with ten employee-led affinity groups across 190 global chapters. The team values work-life balance and provides flexibility in working hours. You'll have opportunities for mentorship and career growth, working alongside experienced professionals in a supportive environment that celebrates knowledge sharing.

This role offers competitive compensation based on location and experience, with additional benefits including equity, sign-on payments, and comprehensive medical and financial benefits. Join us in shaping the future of AI infrastructure and help our customers leverage the power of machine learning at scale.

Last updated 3 minutes ago

Responsibilities For Software Development Engineer, SageMaker HyperPod Data Plane

  • Developing innovative solutions for supporting Large Language Model training in a cluster of nodes
  • Develop and maintain a performant, resilient and fully-managed service for training large-scale foundation models
  • Optimizing distributed training by profiling and identifying bottlenecks
  • Serve as a key technical resource in the full development cycle
  • Own delivery of entire piece of the system and serve as technical lead
  • Hire/mentor junior development engineers

Requirements For Software Development Engineer, SageMaker HyperPod Data Plane

Python
Go
Kubernetes
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Experience in multi-threaded asynchronous C++/Go development
  • Experience with kubernetes and high performance computing
  • Experience in large language model training

Benefits For Software Development Engineer, SageMaker HyperPod Data Plane

Medical Insurance
401k
  • Work-life balance
  • Flexible working hours
  • Mentorship opportunities
  • Career growth opportunities
  • Employee-led affinity groups
  • Comprehensive benefits package

Interested in this job?

Jobs Related To Amazon Software Development Engineer, SageMaker HyperPod Data Plane

Software Dev Engineer II, Alexa Connect Kit

Software Development Engineer II position at Amazon's Alexa Connect Kit team, focusing on IoT and smart home device integration, requiring 3+ years of experience.

Software Development Engineer, FPDS Jobs Domain

Software Development Engineer position at Amazon focused on building solutions to revolutionize workforce management systems at scale.

Software Development Engineer, IPP EU

Software Development Engineer role at Amazon's Consumer Payments team in Bangalore, focusing on building innovative global payment solutions and installment products.

Software Development Engineer, EU Marketing, EU Marketing

Software Development Engineer role at Amazon's London Marketing Tech Hub, focusing on building large-scale digital marketing systems and customer-facing experiences.

Software Development Engineer, Workforce Solutions - Hiring Software Delivery

Software Development Engineer role at Amazon building scalable solutions for workforce management, focusing on pay incentives and job mobility for global employees.