Taro Logo

Staff Software Engineer, Machine Learning Performance

Google is a leading technology company that develops innovative products and services used by billions of users worldwide.
Machine Learning
Staff Software Engineer
In-Person
5,000+ Employees
8+ years of experience
AI
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Staff Software Engineer, Machine Learning Performance

Google's TPU Performance team is seeking a Staff Software Engineer to drive performance and efficiency for AI/ML training workloads. This role focuses on Large Language Models, including Google Deepmind Gemini, Bard, and Cloud LLM APIs. The ideal candidate will have extensive experience in software development, machine learning algorithms, and performance optimization. They will work on identifying and maintaining LLM benchmarks, driving TensorFlow/JAX TPU performance, and engaging with product teams to solve LLM performance challenges. The role requires expertise in distributed development, large-scale data processing, and the ability to lead project teams in a complex, matrixed organization. This position offers the opportunity to work on cutting-edge AI technologies and shape the future of machine learning performance at Google.

Last updated 8 months ago

Responsibilities For Staff Software Engineer, Machine Learning Performance

  • Focus on Large Language Models (Google Deepmind Gemini, Bard, Search Magi, Cloud LLM APIs), performance analysis, and optimizations
  • Identify and maintain Large Language Model (LLM) training and serving benchmarks that are representative to Google production, industry and Machine Learning community, use them to identify performance opportunities and drive TensorFlow/JAX TPU out-of-the-box performance, and to gate TF/JAX releases
  • Engage with Google Product teams to solve their LLM performance problem such as onboarding new LLM models and products on Google new TPU hardware, enabling LLMs to train efficiently on very large-scale (i.e., thousands of TPUs), etc
  • Explore model/data efficiency techniques such as new ML model architecture/optimizer/training technique to solve a ML task more efficiently, new techniques to reduce the label/unlabeled ML data needed to train a model to aim accuracy

Requirements For Staff Software Engineer, Machine Learning Performance

Python
  • Bachelor's degree or equivalent practical experience
  • 8 years of experience in software development, and with data structures/algorithms
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture
  • 5 years of experience with machine learning algorithms and tools (e.g., TensorFlow), artificial intelligence, deep learning, or natural language processing

Interested in this job?