The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The Acceleration Kernel Library team is at the forefront of maximizing performance for AWS's custom ML accelerators. Working at the hardware-software boundary, our engineers craft high-performance kernels for ML functions, ensuring every FLOP counts in delivering optimal performance for our customers' demanding workloads.
The AWS Neuron SDK is a comprehensive toolkit that includes an ML compiler, runtime, and application framework, seamlessly integrating with popular ML frameworks like PyTorch. As part of the Neuron Compiler organization, the team works across multiple technology layers - from frameworks and compilers to runtime and collectives, optimizing current performance and contributing to future architecture designs.
The role offers a unique opportunity to work at the intersection of machine learning, high-performance computing, and distributed architectures. Engineers collaborate across compiler, runtime, framework, and hardware teams to optimize machine learning workloads for global customers. The position involves designing and implementing high-performance compute kernels, analyzing and optimizing kernel-level performance, and working directly with customers to enable and optimize their ML models.
The team values work-life balance and offers flexibility in working hours. They embrace diversity and inclusion, with ten employee-led affinity groups reaching 40,000 employees globally. Career growth and mentorship are prioritized, with projects assigned to help team members develop into better-rounded professionals. The hybrid work model allows engineers to choose between full office presence or flexible arrangements near US Amazon offices.
This role is perfect for someone passionate about pushing the boundaries of AI acceleration technology, combining deep hardware knowledge with ML expertise to deliver optimal performance for demanding workloads.