Taro Logo

Senior Site Reliability Engineer

Leading global technology company enabling responsible marketing in walled garden social environments through patented AI technology.
Marina Del Rey, CA 90292, USA
$150,000 - $170,000
Site Reliability
Senior Software Engineer
Hybrid
501 - 1,000 Employees
6+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Site Reliability Developer (Join OCI Ns2)

Senior Site Reliability Developer position at Oracle focusing on building and maintaining large-scale distributed systems with emphasis on security, resiliency, and performance.

Site Reliability Engineer

Senior Site Reliability Engineer position at Wheely, focusing on infrastructure security, monitoring, and DevOps practices in Nicosia, Cyprus.

Senior Software Engineer, Site Reliability Engineering

Senior SRE position at Adobe working on Identity Services, focusing on scalability, reliability and zero downtime for systems handling millions of requests.

Site Reliability Engineer

Senior Site Reliability Engineer position at Bounteous in Montreal, focusing on system reliability, ServiceNow administration, and operational excellence in a hybrid work environment.

Senior Site Reliability Engineer

Senior SRE role at Oracle focusing on designing and managing scalable infrastructure for enterprise applications using OCI across multiple regions.

Description For Senior Site Reliability Engineer

Zefr, a leading global technology company, is seeking a Senior Site Reliability Engineer to join their team in Marina del Rey, CA. This role combines traditional SRE responsibilities with a special focus on Machine Learning infrastructure, making it an unique opportunity for experienced engineers passionate about both reliability and AI systems.

The position requires expertise in cloud infrastructure, CI/CD, and core SRE concepts, with a particular emphasis on supporting ML workloads. You'll be working with a modern tech stack including GCP, AWS, Kubernetes, and various ML-specific tools like Triton Inference Server and HuggingFace. The role offers a competitive salary range of $150,000-$170,000 and comes with comprehensive benefits including flexible PTO, medical coverage, and 401(k) matching.

As an SRE at Zefr, you'll be responsible for ensuring the reliability and scalability of their AI-powered marketing technology platform. You'll work closely with the Machine Learning team to build and maintain specialized infrastructure for model training and deployment. The position requires 6+ years of cloud infrastructure experience and at least 1 year of ML infrastructure experience.

The company offers a hybrid work environment and emphasizes continuous learning and innovation. They're looking for someone who can both contribute their expertise and grow with the team. This role is perfect for an experienced SRE who wants to work at the intersection of infrastructure and machine learning, while helping shape the future of digital marketing technology.

Last updated 3 days ago

Responsibilities For Senior Site Reliability Engineer

  • Support and build systems and tools for engineers to deploy and manage features and models
  • Deploy and support multi-cloud, micro-service architecture with ML workloads
  • Collaborate with ML team to architect secure, resilient, scalable systems
  • Foster DevOps culture and continuous improvement
  • Maintain production environments and monitor ML model performance
  • Participate in 24/7 on-call rotation
  • Debug code at application and infrastructure level
  • Mature CI/CD workflows and release process
  • Propose and review Engineering RFCs

Requirements For Senior Site Reliability Engineer

Python
Kubernetes
Redis
PostgreSQL
Node.js
React
Kafka
  • 6+ years experience designing and managing Cloud Infrastructure in production
  • Production experience with container-based workloads in Kubernetes clusters
  • 1+ year of Machine Learning Infrastructure Development and Operations
  • Knowledge of GitOps and modern CI/CD pipelines
  • Knowledge of IaC and configuration management tools
  • Strong problem-solving experience with focus on automation
  • Production experience with Monitoring and Observability tools
  • Understanding of Cloud Networking concepts
  • Strong written and verbal communication skills

Benefits For Senior Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
  • Flexible PTO
  • Medical, dental, and vision insurance with FSA options
  • Company-paid life insurance
  • Paid parental leave
  • 401(k) with company match
  • Professional development opportunities
  • 13+ paid holidays off
  • Summer Fridays
  • Hybrid work schedule
  • In-office lunches and free food
  • Optional in-person and virtual events

Interested in this job?