AWS Hardware Engineering team is seeking a Systems Development Engineer to join their Generative AI & ML Servers initiative. This role focuses on building the backbone of AWS's Generative AI cloud infrastructure and developing next-generation platforms for AI training and inference. The position involves creating server designs that are industry-leading in frugality and operational excellence, critical to AWS's success and its millions of customers.
The ideal candidate will be an innovative self-starter with comprehensive knowledge of the full technical stack - from bare metal server hardware to userland software. You'll work on delivering continuous price performance improvements for AI model training, including multi-billion variable LLMs. The role requires excellent system debugging skills and the ability to find interaction issues between server components.
As part of the Hardware Engineering AI/ML development team, you'll collaborate with various roles across AWS, including SDEs, Hardware Engineers, TPMs, and other teams. The position offers the opportunity to work on global-scale projects with development teams in Seattle, Cupertino, and Austin, impacting AWS's worldwide datacenter operations.
Key responsibilities include solving complex architectural problems, owning team systems, proactively identifying and addressing deficiencies, and decomposing challenging server system testability issues into manageable components. You'll use a combination of hardware, software, system designs, x86 architecture, and operational knowledge to deliver solutions that directly benefit AWS customers.
The role offers competitive compensation, comprehensive benefits, and the chance to work at the forefront of cloud computing technology. You'll be part of AWS's innovative culture that values learning, diversity, and work-life harmony, with opportunities for career growth and mentorship in a fast-paced, dynamic environment.