Zefr, a leading global technology company, is seeking a Senior Site Reliability Engineer to join their team in Marina del Rey, CA. This role combines traditional SRE responsibilities with a special focus on Machine Learning infrastructure, making it an unique opportunity for experienced engineers passionate about both reliability and AI systems.
The position requires expertise in cloud infrastructure, CI/CD, and core SRE concepts, with a particular emphasis on supporting ML workloads. You'll be working with a modern tech stack including GCP, AWS, Kubernetes, and various ML-specific tools like Triton Inference Server and HuggingFace. The role offers a competitive salary range of $150,000-$170,000 and comes with comprehensive benefits including flexible PTO, medical coverage, and 401(k) matching.
As an SRE at Zefr, you'll be responsible for ensuring the reliability and scalability of their AI-powered marketing technology platform. You'll work closely with the Machine Learning team to build and maintain specialized infrastructure for model training and deployment. The position requires 6+ years of cloud infrastructure experience and at least 1 year of ML infrastructure experience.
The company offers a hybrid work environment and emphasizes continuous learning and innovation. They're looking for someone who can both contribute their expertise and grow with the team. This role is perfect for an experienced SRE who wants to work at the intersection of infrastructure and machine learning, while helping shape the future of digital marketing technology.