Principal Site Reliability & Availability Engineer

Salesforce

Salesforce is a leading customer relationship management (CRM) platform and cloud computing company.

San Francisco, CA, USA • Bellevue, WA, USA

$223,000 - $323,400

Site Reliability

Principal Software Engineer

Hybrid

5,000+ Employees

15+ years of experience

Enterprise SaaS · Cloud

This job posting may no longer be active. You may be interested in these related jobs instead:

Director of Engineering – Analytics SRE

Oracle

Lead SRE team for Oracle Health Data Intelligence, overseeing analytics platforms and driving reliability best practices.

Director, Software Engineering, Site Reliability

Lead LinkedIn's Site Reliability Engineering team in Bengaluru, directing 40+ engineers to ensure reliability of critical infrastructure systems while driving innovation and operational excellence.

Principal AI Infrastructure SRE Engineer

NVIDIA

Lead AI infrastructure transformation at NVIDIA as a Principal SRE Engineer, managing large-scale systems and implementing modern automation solutions.

Principal Site Reliability Developer

Oracle

Principal Site Reliability Developer position at Oracle, focusing on cloud services and infrastructure with 10+ years experience required, based in Bengaluru, India.

Principal Site Reliability Developer

Oracle

Principal Site Reliability Developer position at Oracle, focusing on cloud infrastructure, automation, and distributed systems architecture in Bengaluru.

Description For Principal Site Reliability & Availability Engineer

As a Principal Site Reliability & Availability Engineer at Salesforce, you'll be part of a specialist unit focused on availability and resilience. You'll embed with delivery teams, acting in a Lead capacity, creating bandwidth and prioritizing corrective and proactive availability measures. Your responsibilities include:

Designing, developing, debugging, and operating resilient applications and platforms deployed across distributed systems running on thousands of compute nodes in multiple data centers.
Championing resiliency best practices, including observability tool integration, horizontal/vertical sizing & auto-scaling, release rollback & recovery workflows, and integration tests.
Using and contributing to open source technology (e.g., Spinnaker, Zookeeper).
Developing and leveraging Infrastructure-as-Code using Terraform.
Building and integrating with APIs and microservices deployed on containerization frameworks such as Kubernetes, Docker, and Mesos.
Resolving complex technical issues and driving innovations to improve system availability, resilience, and performance.
Balancing live runtime management, feature delivery, and retirement of technical debt.
Participating in the team's on-call rotation to address complex problems in real-time and maintain high service availability.

Required skills include:

A related technical degree (master's preferred)
15+ years of hands-on software development experience
5+ years in a Tech Lead, Principal, or Architect capacity
Mastery of object-oriented languages like Java, Golang, APEX, or Python
Deep experience with core web technologies and databases
Expertise in service ownership best practices, SLO/I/A definition, and incident management

Join Salesforce to work on cutting-edge technology and contribute to the reliability and availability of systems used by millions of users worldwide.

Last updated 7 months ago

Responsibilities For Principal Site Reliability & Availability Engineer

Design, develop, debug, and operate resilient applications and platforms
Champion resiliency best practices
Use and contribute to open source technology
Develop Infrastructure-as-Code using Terraform
Build and integrate APIs and microservices
Resolve complex technical issues and drive innovations
Balance live runtime management, feature delivery, and technical debt retirement
Participate in on-call rotation

Requirements For Principal Site Reliability & Availability Engineer

Java

Python

Kubernetes

Related technical degree (master's preferred)
15+ years of hands-on software development experience
5+ years in a Tech Lead, Principal, or Architect capacity
Mastery of object-oriented languages (Java, Golang, APEX, Python)
Deep experience with core web technologies (HTTP, JSON, REST, XML)
Proficiency with databases (Oracle, relational, NoSQL)
Experience with critical infrastructure services
Subject matter expertise on Service ownership best practices
Thorough knowledge of Agile development methodology

Salesforce

Salesforce is a leading customer relationship management (CRM) platform and cloud computing company.

San Francisco, CA, USA • Bellevue, WA, USA

$223,000 - $323,400

Site Reliability

Principal Software Engineer

Hybrid

5,000+ Employees

15+ years of experience

Enterprise SaaS · Cloud

Interested in this job?