Cloudflare is seeking a Systems Reliability Engineer (SRE) to join their Edge platform team, operating across more than 320 cities in over 120 countries. This role sits at the intersection of systems, network, and software engineering, focusing on maintaining and improving Cloudflare's vast global network infrastructure.
The position requires a strong background in automation, scalability, and operational excellence. Working in a "follow the sun" model across global offices, you'll be responsible for building tools to enhance service availability, performance, and operational efficiency. The role demands a passionate curiosity about Internet fundamentals, combined with strong knowledge of networking, Linux, and TLS, along with coding abilities in languages like Go, Rust, or Python.
As an SRE at Cloudflare, you'll be part of a team that manages the immediate state and functionality of Cloudflare's worldwide platform. You'll work with various monitoring, alerting, and diagnostic tools while continuously improving the platform's capabilities. The role involves owning a wide portfolio of applications and services, maintaining a tight feedback loop between development and operations.
The ideal candidate should have at least 3 years of experience in an SRE role or similar position, with strong Linux systems experience and software development skills. Knowledge of distributed systems, network protocols, and system design trade-offs is essential. Experience with tools like Nginx, PostgreSQL, Docker, Prometheus, and Grafana would be advantageous.
This is an excellent opportunity to join a high-performing team at a company that's helping build a better Internet. Cloudflare's mission extends beyond commercial success, with initiatives like Project Galileo protecting journalism and civil society organizations, and the Athenian Project securing election websites. The company values diversity and inclusiveness, seeking curious and empathetic individuals committed to personal growth and learning.