Taro Logo

Staff Software Engineer, ML Training Platform

Profile picture

Pinterest

Pinterest is a visual discovery platform where millions of people find inspiration and plan for what matters most in their lives.
San Francisco Bay Area, USA
$166,694 - $342,844
Machine Learning · Backend · Cloud
Staff Software Engineer
Hybrid
7+ years

Description

Pinterest is seeking a highly skilled and experienced Staff Software Engineer to join their ML Training Infrastructure team and lead the technical strategy. As part of the ML Platform team in Data Engineering, you'll work on foundational tools and infrastructure used by hundreds of ML engineers across Pinterest, including recommendations, ads, visual search, growth/notifications, trust and safety. The role involves implementing scalable solutions for ML workloads, leading key projects, setting multi-year roadmaps, collaborating with internal clients, forging partnerships, and mentoring engineers. You'll be working on various efforts related to adoption, efficiency, performance, algorithms, UX, and core infrastructure to enable the scheduling of ML workloads. This position offers the opportunity to make a significant impact on Pinterest's ML infrastructure and contribute to the company's mission of helping people find inspiration and create a life they love. The ideal candidate will have extensive experience in software engineering and machine learning, with a focus on ML infrastructure or Batch Compute infrastructure. Strong technical leadership, understanding of High Performance Computing, and experience with Python and cloud technologies are essential for success in this role.

Last updated

Responsibilities

  • Implement cost effective and scalable solutions for ML training and inference workloads on compute platforms like Kubernetes
  • Lead and contribute to key projects like GPU sharing, intelligent resource management, capacity planning, and fault tolerant training
  • Lead the technical strategy and set the multi-year roadmap for ML Training Infrastructure
  • Collaborate with internal clients, ML engineers, and data scientists to address concerns and enable successful implementation of customer use cases
  • Forge strong partnerships with tech leaders to develop a comprehensive technical roadmap
  • Mentor engineers within the team and demonstrate technical leadership

Requirements

Python
Kubernetes
Java
  • 7+ years of experience in software engineering and machine learning
  • Focus on building and maintaining ML infrastructure or Batch Compute infrastructure
  • Technical leadership experience, devising multi-quarter technical strategies
  • Strong understanding of High Performance Computing and/or parallel computing
  • Ability to drive cross-team projects and understand internal customers
  • Strong experience in Python and/or other programming languages such as C++ and Java
  • Experience with GPU programming, containerization, orchestration technologies (preferred)
  • Experience with cloud data processing technologies and ML frameworks (bonus)

Benefits

  • Flexible work model (PinFlex)
  • Equity compensation
  • Competitive salary range

Interested in this job?