Google is seeking a Staff Software Engineer to lead machine learning performance optimization for their TPU (Tensor Processing Unit) systems. This role is part of the ML, Systems, and Cloud AI organization, which is responsible for the infrastructure powering Google's services and Cloud AI offerings. The position focuses on maximizing efficiency for ML/AI workloads, particularly around Large Language Models (LLMs).
The ideal candidate will work at the intersection of machine learning infrastructure and performance optimization, driving improvements in how Google's ML models train and serve across their TPU fleet. Key responsibilities include maintaining LLM benchmarks, implementing optimization techniques, and collaborating with product teams to solve performance challenges.
This is an opportunity to impact Google's next-generation AI technologies, working with cutting-edge hardware like TPUs and software frameworks like TensorFlow and JAX. The role offers competitive compensation including base salary, bonus, equity, and comprehensive benefits.
The position requires deep expertise in both software engineering and machine learning systems, with opportunities to shape the future of AI infrastructure at scale. You'll be working with teams across Google to optimize critical ML workloads that power products used by billions of users worldwide.