Principal Site Reliability Engineer

Salesforce

Global leader in CRM and enterprise cloud computing solutions.

San Francisco, CA, USA • Seattle, WA, USA

$223,000 - $323,400

Site Reliability

Principal Software Engineer

In-Person

5,000+ Employees

15+ years of experience

Enterprise SaaS · Cloud

This job posting may no longer be active. You may be interested in these related jobs instead:

Principal Software Engineering - Availability

Salesforce

Principal Software Engineering role at Salesforce focusing on Site Reliability Engineering, building and maintaining large-scale distributed systems with 15+ years of experience required.

Principal AI Infrastructure SRE Engineer

NVIDIA

Lead AI infrastructure transformation at NVIDIA as a Principal SRE Engineer, managing large-scale systems and implementing modern automation solutions.

Principal Site Reliability Developer

Oracle

Principal Site Reliability Developer position at Oracle, focusing on cloud services and infrastructure with 10+ years experience required, based in Bengaluru, India.

Principal Site Reliability Developer

Oracle

Principal Site Reliability Developer position at Oracle, focusing on cloud infrastructure, automation, and distributed systems architecture in Bengaluru.

Director, Software Engineering - SRE

Capital One

Lead SRE engineering teams at Capital One, overseeing system reliability and scalability while managing and mentoring software engineers in a technology-forward financial institution.

Description For Principal Site Reliability Engineer

Salesforce is seeking a Principal Site Reliability Engineer to join their Availability Engineering teams. This role is crucial in driving 'best in class' availability across their multi-substrate engineering platform that serves tens of millions of users. The position requires deep expertise in large-scale systems and concurrency, with a focus on crafting highly available solutions.

As a Principal SRE, you'll work with delivery teams to implement and maintain resilient applications deployed across thousands of compute nodes in multiple data centers. The role involves championing resiliency best practices, working with various cloud platforms (AWS, GCP, Azure & Alibaba), and contributing to open-source technologies.

The ideal candidate will bring 15+ years of software development experience, with at least 5 years in a leadership role. You'll be responsible for reverse engineering solutions, defining availability improvement projects, and maintaining critical infrastructure services. This position offers the opportunity to work on complex technical challenges while ensuring system reliability for one of the world's leading enterprise software companies.

You'll be part of a specialist unit focused on availability and resilience, where you'll have the chance to influence architectural decisions, mentor team members, and drive innovation in system availability. The role combines technical leadership with hands-on development, requiring both strategic thinking and practical implementation skills.

This is an excellent opportunity for a seasoned engineer who is passionate about system reliability, enjoys solving complex distributed systems challenges, and wants to make a significant impact on a platform that powers businesses worldwide. The position offers competitive compensation and the chance to work with cutting-edge technologies in a collaborative environment.

Last updated 5 months ago

Responsibilities For Principal Site Reliability Engineer

Embed with delivery teams in a Lead capacity, focusing on corrective and proactive availability measures
Design, develop, debug, and operate resilient applications across distributed systems
Champion resiliency best practices including observability tool integration and auto-scaling
Develop Infrastructure-as-Code using Terraform
Build/integrate with APIs and microservices on containerization frameworks
Resolve complex technical issues and drive innovations for system availability
Participate in on-call rotation to address complex problems in real-time
Balance live runtime management, feature delivery, and retirement of technical debt

Requirements For Principal Site Reliability Engineer

Java

Python

Kubernetes

Related technical degree required (masters preferred)
15+ years of hands-on software development experience
5+ years in a Tech Lead, Principal or Architect capacity
Mastery of object oriented languages such as Java, Golang, APEX, Python
Deep experience with core web technologies: HTTP, JSON, REST, XML
Proficiency with databases including Oracle or other relational/NoSQL solutions
Experience owning and operating multiple instances of critical services
Subject matter expertise on Service ownership best practices
Thorough knowledge of Agile development methodology

Salesforce

Global leader in CRM and enterprise cloud computing solutions.

San Francisco, CA, USA • Seattle, WA, USA

$223,000 - $323,400

Site Reliability

Principal Software Engineer

In-Person

5,000+ Employees

15+ years of experience

Enterprise SaaS · Cloud

Interested in this job?