The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The Acceleration Kernel Library team is at the forefront of maximizing performance for AWS's custom ML accelerators.
As a ML Kernel Performance Engineer, you'll work at the hardware-software boundary, crafting high-performance kernels for ML functions and ensuring optimal performance for customer workloads. The role combines deep hardware knowledge with ML expertise to push the boundaries of AI acceleration. The AWS Neuron SDK includes an ML compiler, runtime, and application framework that seamlessly integrates with popular ML frameworks like PyTorch.
Working within the Neuron Compiler organization, you'll collaborate across multiple technology layers - from frameworks and compilers to runtime and collectives. You'll not only optimize current performance but also contribute to future architecture designs, working directly with customers to enable their models and ensure optimal performance.
The position offers competitive compensation ranging from $129,300 to $223,600 based on location, plus equity, sign-on payments, and comprehensive benefits. You'll join a diverse, inclusive team that values work-life balance and provides extensive opportunities for mentorship and career growth. The role is based in Cupertino, CA, where you'll work with cutting-edge technology at the intersection of machine learning, high-performance computing, and distributed architectures.
This is an opportunity to shape the future of AI acceleration technology while working in a startup-like environment within AWS, where innovation and experimentation are encouraged, and your contributions will have direct impact on global customers' ML workload performance.