Annapurna Labs, a crucial part of AWS, is seeking an experienced engineer to work on distributed AI/ML systems. This role focuses on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers. The position involves working with C/C++ at a low level, requiring solid knowledge of Linux, kernels, and performant code.
The team is part of AWS's infrastructure development, where every EC2 instance runs on hardware designed by Annapurna Labs. You'll work alongside infrastructure experts, hardware engineers, RTL engineers, scientists & architects in a truly international environment. The role offers significant opportunities to work on cutting-edge AI/ML technology while maintaining work-life balance.
The position offers competitive compensation ranging from $129,300 to $223,600 based on location, plus equity and comprehensive benefits. You'll be working in a collaborative environment that values mentorship, knowledge-sharing, and career growth. The team includes various experience levels and tenures, with senior members providing one-on-one mentoring and thorough code reviews.
Key responsibilities include developing distributed AI/ML systems, implementing collective operations for AI scaling, writing performant C/C++ code, and collaborating across disciplines. The role requires 3+ years of professional software development experience, strong Linux knowledge, and preferably experience with embedded systems and high-speed networking.
This position represents an opportunity to work at the forefront of AI/ML technology, developing features for the largest clusters and AI models while working with major customers. The team emphasizes continuous learning, professional growth, and work-life harmony, making it an ideal environment for engineers passionate about solving complex problems in the AI/ML space.