Taro Logo

Senior Site Reliability Engineer, Production Engineering

A Digital Experience Assurance platform that empowers organizations to deliver seamless digital experiences across networks using AI and cloud telemetry data.
Site Reliability
Senior Software Engineer
Hybrid
5+ years of experience
Enterprise SaaS

Description For Senior Site Reliability Engineer, Production Engineering

Cisco ThousandEyes, a leading Digital Experience Assurance platform, is seeking a Senior Site Reliability Engineer to join their Production Engineering team in London. This role offers an exciting opportunity to work with cutting-edge cloud technologies and contribute to a platform that's deeply integrated across Cisco's extensive technology portfolio.

The ideal candidate will be responsible for designing and managing large-scale, highly available distributed systems in the cloud. You'll work directly with application development teams to enhance the reliability, performance, and security of the platform. The role involves working with modern technologies including Kubernetes, AWS, and various CNCF solutions.

Key responsibilities include optimizing architecture for availability and performance, implementing scalable operations tooling, and participating in incident response. You'll be instrumental in automating production operations and developing solutions for platform scaling across multiple regions.

The position requires expert-level knowledge of Kubernetes, proficiency in Python or Go, and strong understanding of cloud platforms, particularly AWS. With a hybrid work arrangement requiring at least one day per week in the London office, this role offers an excellent opportunity to work on challenging technical problems while maintaining work-life balance.

Cisco ThousandEyes values diverse perspectives and encourages applications from candidates with varied backgrounds, emphasizing potential over traditional qualifications. The company offers a collaborative environment where you'll work with cutting-edge technologies while contributing to a platform that helps organizations deliver seamless digital experiences.

Last updated a month ago

Responsibilities For Senior Site Reliability Engineer, Production Engineering

  • Collaborate with software engineers to optimize architecture and services
  • Design and implement scalable operations tooling
  • Design, deploy, and maintain AWS cloud-native services
  • Participate in 24x7 incident response and on-call rotation
  • Use and expand CNCF solutions like Kubernetes, Service Mesh, Prometheus
  • Automate production operations
  • Develop automation solutions for scalable service and platform operations
  • Identify and provide solutions to common obstacles
  • Manage rapidly growing infrastructure

Requirements For Senior Site Reliability Engineer, Production Engineering

Kubernetes
Python
Go
Linux
  • Expert-level knowledge of Kubernetes and its ecosystem
  • Proficiency in software development with Python or Go
  • In-depth knowledge of cloud providers, preferably AWS
  • Strong understanding of Unix/Linux systems
  • Knowledge of Site Reliability principles
  • 5+ years of experience in a related role
  • Excellent communication and documentation skills
  • Strong sense of ownership, drive, and attention to detail

Interested in this job?

Jobs Related To Cisco ThousandEyes Senior Site Reliability Engineer, Production Engineering