Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

An integral part of AWS that develops hardware and software components for EC2 infrastructure, specializing in designing software, systems and chips that optimize AWS customer experience.
$151,300 - $261,500
Distributed Systems
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

Annapurna Labs, a crucial part of AWS, is seeking an experienced engineer to work on distributed AI/ML systems. This role focuses on developing collective operations that enable AI to scale across multiple accelerators & servers. The position involves working with C/C++ in a low-level environment, requiring solid knowledge of Linux, kernels, and performance optimization.

The team is at the forefront of AI/ML development, working on features for the largest clusters and AI models. You'll be part of a diverse, international workforce, collaborating with infrastructure experts, hardware engineers, RTL engineers, scientists & architects. The organization values mentorship, both receiving and providing guidance to team members.

Key Responsibilities:

  • Developing and optimizing networking solutions for Machine Learning and High-Performance Computing workloads
  • Working with HPC and ML customers to deliver scalable solutions
  • Contributing to the development of fundamental operations for AI scaling
  • Collaborating with cross-functional teams across hardware and software domains

The role offers:

  • Flexible working hours and strong work-life balance
  • Opportunity to work with principal-level engineers and directors
  • Clear career growth paths and continuous learning opportunities
  • International team environment with diverse perspectives
  • Competitive compensation package ranging from $151,300 to $261,500 based on location

Required Qualifications:

  • 5+ years of professional software development experience
  • Strong programming skills and system architecture experience
  • Experience with full software development lifecycle
  • Leadership experience as a mentor or tech lead
  • Deep understanding of Linux and kernel operations

The position is ideal for candidates passionate about solving complex problems in AI/ML infrastructure, with a focus on performance optimization and scalability. You'll be working in an environment that values innovation, mentorship, and professional growth while contributing to cutting-edge technology that powers AWS services.

Last updated 8 days ago

Responsibilities For Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

  • Develop and optimize networking solutions for ML and HPC workloads
  • Work on collective operations for AI scaling
  • Collaborate with infrastructure experts and hardware engineers
  • Mentor junior engineers and contribute to team growth
  • Design and implement high-performance distributed systems
  • Work directly with HPC and ML customers

Requirements For Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

Linux
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Strong knowledge of Linux and kernel operations
  • Experience with C/C++ programming

Benefits For Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

Medical Insurance
401k
  • Medical Insurance
  • 401k

Interested in this job?

Jobs Related To Annapurna Labs (U.S.) Inc. Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

Senior Software Engineer - Distributed Data Systems

Senior Software Engineer position at Databricks focusing on building distributed data systems, including Apache Spark, Delta Lake, and high-performance data processing engines.

Senior Software Engineer - Storage

Senior Software Engineer position at Roblox focusing on large-scale distributed storage systems, caching, and queue management with competitive compensation and benefits.

Senior Software Engineer - Distributed Systems

Senior Software Engineer role at Datadog focusing on distributed systems, building scalable data pipelines processing billions of events, using Go, Java, Rust, and modern open-source technologies.

Software Engineer with Systems Depth

Senior Software Engineer role at Datadog focusing on systems infrastructure and tooling, offering $130K-$300K salary plus benefits in Denver, CO.

Senior Software Engineer, Distributed Backend

Senior Software Engineer position at Roku focusing on distributed backend systems for advertising platform, requiring 10+ years of experience in building scalable, real-time systems.