Staff Software Engineer, Site Reliability Engineering

Google is a global technology leader that specializes in internet-related services and products.
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
8+ years of experience
Enterprise SaaS · AI

Description For Staff Software Engineer, Site Reliability Engineering

Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Staff Software Engineer in SRE, you'll ensure that Google Cloud's services have reliability, uptime appropriate to customer's needs, and a fast rate of improvement. You'll work on optimizing existing systems, building infrastructure, and eliminating work through automation.

The role requires expertise in coding, algorithms, complexity analysis, and large-scale system design. You'll manage complex challenges of scale unique to Google Cloud while working in a culture that values diversity, intellectual curiosity, problem-solving, and openness.

Key responsibilities include engaging in the entire lifecycle of services, supporting services pre-launch, scaling systems sustainably, working on critical Google Cloud services, and solving operations problems using software engineering principles. You'll collaborate with developer teams on design, architecture, and processes.

The Technical Infrastructure team, which you'll be part of, is crucial in developing and maintaining data centers and building the next generation of Google platforms. This team ensures that Google's networks run smoothly, providing users with the best and fastest experience possible.

Ideal candidates will have experience in computing, distributed systems, storage, or networking, with strong skills in designing, analyzing, and troubleshooting large-scale distributed systems. The ability to debug, optimize code, and automate routine tasks is essential, along with excellent problem-solving and communication skills.

Join Google's SRE team to work on meaningful projects, collaborate with diverse perspectives, and contribute to the architecture that powers Google's vast product portfolio.

Last updated 2 days ago

Responsibilities For Staff Software Engineer, Site Reliability Engineering

  • Engage in and improve the whole lifecycle of services from inception, design to deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Work on the availability, scalability, efficiency and latency of some of Google Cloud's most critical services
  • Solve operations problems by using software engineering principles and best practices. Collaborate with the developer teams on design, architecture and processes

Requirements For Staff Software Engineer, Site Reliability Engineering

Java
Python
Go
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 8 years of experience with data structures or algorithms
  • 5 years of experience with software development in one or more programming languages
  • 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems

Benefits For Staff Software Engineer, Site Reliability Engineering

Equity
  • Equal opportunity employer
  • Accommodation for applicants with special needs

Interested in this job?

Jobs Related To Google Staff Software Engineer, Site Reliability Engineering

Site Reliability Engineer (SRE)

Site Reliability Engineer for AI-driven autonomous vehicles at Wayve

Staff Software Engineer, Reliability Engineering

Staff Software Engineer for Site Reliability Engineering at Airbnb, developing tools and systems for service reliability and incident management.

Engineering Manager, Reliability Engineering

Airbnb seeks an Engineering Manager for Site Reliability to drive long-term strategy and ensure infrastructure performance.

Staff Site Reliability Engineer

Staff Site Reliability Engineer at Replicant, leading Contact Center Automation with AI and LLMs.

Senior Compute SRE (GPU) - Apple Services Engineering

Senior Compute SRE (GPU) role at Apple Services Engineering, focusing on GPU-accelerated infrastructure and Kubernetes clusters.