Cisco ThousandEyes is seeking a Senior Site Reliability Engineer to join their Production Engineering team. ThousandEyes is a leading Digital Experience Assurance platform that helps organizations deliver seamless digital experiences across networks. The role focuses on designing and managing large-scale, highly available distributed systems in the cloud, working directly with application development teams to enhance platform reliability, performance, and security.
The ideal candidate will have expert-level knowledge of Kubernetes and its ecosystem, strong proficiency in Python or Go programming, and deep understanding of cloud providers (especially AWS). They will be responsible for identifying and solving operational excellence challenges, implementing scalable solutions, and maintaining a growing infrastructure with emphasis on automation and code-driven operations.
This is a hybrid position based in Oeiras, Portugal, requiring one day per week in the office. The role involves participating in 24x7 incident response, working with cloud-native tools like Prometheus, Istio, and ArgoCD, and collaborating closely with development teams to optimize service architecture for availability and performance.
The position offers comprehensive benefits including medical, dental, and vision insurance, 401k with company match, disability coverage, and various time-off benefits. Cisco values diverse perspectives and encourages applications from candidates with varied backgrounds, emphasizing potential over traditional qualifications.
Working at Cisco ThousandEyes means joining a team at the forefront of network monitoring and digital experience assurance, with opportunities to work on challenging technical problems at scale while contributing to a product that helps organizations maintain reliable digital services.