Taro Logo

Site Reliability Engineer

Experts in strategic database and analytics services, driving digital transformation and operational excellence since 1997.
Site Reliability
Mid-Level Software Engineer
Remote
3+ years of experience
Enterprise SaaS · AI

Job Description

Pythian, established in 1997, is a multinational company specializing in strategic database and analytics services, with a strong focus on digital transformation and operational excellence. They're building a next-generation Site Reliability Engineering team and seeking engineers who excel in fast-paced, problem-solving environments. The role involves designing, deploying, and operating large-scale distributed systems across compute, storage, networking, and AI/ML environments.

As an SRE at Pythian, you'll be working with cutting-edge technologies including Kubernetes, Istio, and various cloud platforms. The position offers a unique opportunity to work with both clients and teammates to build resilient, high-performing infrastructure. The company has strong partnerships with major tech players like Google Cloud, AWS, Microsoft, Oracle, SAP, and Snowflake.

The role combines technical expertise in infrastructure automation, monitoring, and system optimization with collaborative problem-solving. You'll be part of a team that values continuous learning and professional development, with substantial training allowances and certification opportunities. The position offers complete remote work flexibility, comprehensive benefits, and the chance to work with some of the industry's best talents.

What makes this role particularly attractive is the combination of technical challenges, working with advanced technologies, and the company's strong focus on employee well-being. Pythian provides all necessary equipment, workspace personalization budgets, and wellness benefits, creating an environment where engineers can thrive both professionally and personally.

Last updated a month ago

Responsibilities For Site Reliability Engineer

  • Operate and optimize Kubernetes clusters, Istio service mesh, and Linux-based systems
  • Automate workflows using Go, Python, and Shell scripting
  • Build monitoring and observability solutions with Prometheus, Grafana, and Loki
  • Troubleshoot complex networking, storage, and system performance issues
  • Partner with AI/ML teams to ensure infrastructure readiness for model training and data pipelines
  • Participate in on-call rotations and postmortem reviews to improve system resilience

Requirements For Site Reliability Engineer

Go
Python
Kubernetes
Linux
  • Experience with Google Cloud, plus IaC tools (Terraform)
  • Strong knowledge of microservices, containers (Kubernetes, Docker), and networking
  • Hands-on experience with PKI, service mesh, and Linux systems administration
  • SRE mindset with a focus on automation, scalability, and reliability

Benefits For Site Reliability Engineer

Education Budget
Medical Insurance
  • Competitive total rewards package
  • Training allowance and professional development opportunities
  • Remote work flexibility
  • Equipment provided including laptop with choice of OS
  • Annual workspace personalization budget
  • Annual wellness budget
  • Paid vacation and sick days
  • Volunteer day off
  • Blog during work hours

Related Jobs

Site Reliability Engineer - II

Adobe is hiring a Site Reliability Engineer II in Bangalore to build and maintain scalable, reliable cloud infrastructure and services, requiring 4-8 years of experience with Linux, cloud platforms, and DevOps practices.

Site Reliability Developer

Site Reliability Developer position at Oracle focusing on cloud infrastructure, automation, and service reliability with 3-5+ years experience required.

Site Reliability Engineer II

Site Reliability Engineer II position at Zeta, focusing on maintaining and improving infrastructure reliability and automation in a fintech environment.

Systems Engineer III, Site Reliability Engineering

Systems Engineer III position at Google focusing on Site Reliability Engineering, maintaining and improving large-scale distributed systems and enterprise applications.

Systems Engineer III, Site Reliability Engineering

Systems Engineer III position in Google's Site Reliability Engineering team, focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.