KLA, a global leader in semiconductor process control technology, is seeking a MLOps Site Reliability Engineer to join their team. This role sits at the intersection of machine learning operations and infrastructure reliability, focusing on building and maintaining robust systems for ML workflows. The position offers an opportunity to work with cutting-edge technologies in semiconductor manufacturing, where KLA invests heavily in R&D (15% of sales).
The role involves collaborating with data scientists and ML engineers to ensure the reliable deployment and operation of machine learning systems. You'll be responsible for designing and implementing scalable infrastructure, managing CI/CD pipelines, and ensuring the performance and security of ML systems. The position requires expertise in modern DevOps practices, cloud platforms, and containerization technologies.
KLA's Global Products Group (GPG) and Central Engineering organization, with its 9 Centers-of-Excellence, provides a rich environment for innovation and technical growth. The company's products are crucial in the manufacturing of virtually every electronic device, from smartphones to smart cars.
The ideal candidate will have a strong background in Site Reliability Engineering, combined with knowledge of machine learning concepts and workflows. This role offers the opportunity to make a significant impact on KLA's ML infrastructure while working with a global team of experts in various technical disciplines.
Benefits include competitive compensation and a family-friendly total rewards package, though specific details aren't provided. KLA is an equal opportunity employer committed to providing reasonable accommodations and maintaining an inclusive environment.