Principal Site Reliability Engineer

Global leader in CRM and enterprise cloud computing solutions.
$223,000 - $323,400
Site Reliability
Principal Software Engineer
In-Person
5,000+ Employees
15+ years of experience
Enterprise SaaS · Cloud
This job posting may no longer be active. You may be interested in these related jobs instead:
Principal Software Engineering - Availability

Principal Software Engineering role at Salesforce focusing on Site Reliability Engineering, building and maintaining large-scale distributed systems with 15+ years of experience required.

Principal AI Infrastructure SRE Engineer

Lead AI infrastructure transformation at NVIDIA as a Principal SRE Engineer, managing large-scale systems and implementing modern automation solutions.

Principal Site Reliability Developer

Principal Site Reliability Developer position at Oracle, focusing on cloud services and infrastructure with 10+ years experience required, based in Bengaluru, India.

Principal Site Reliability Developer

Principal Site Reliability Developer position at Oracle, focusing on cloud infrastructure, automation, and distributed systems architecture in Bengaluru.

Director, Software Engineering - SRE

Lead SRE engineering teams at Capital One, overseeing system reliability and scalability while managing and mentoring software engineers in a technology-forward financial institution.

Description For Principal Site Reliability Engineer

Salesforce is seeking a Principal Site Reliability Engineer to join their Availability Engineering teams. This role is crucial in driving 'best in class' availability across their multi-substrate engineering platform that serves tens of millions of users. The position requires deep expertise in large-scale systems and concurrency, with a focus on crafting highly available solutions.

As a Principal SRE, you'll work with delivery teams to implement and maintain resilient applications deployed across thousands of compute nodes in multiple data centers. The role involves championing resiliency best practices, working with various cloud platforms (AWS, GCP, Azure & Alibaba), and contributing to open-source technologies.

The ideal candidate will bring 15+ years of software development experience, with at least 5 years in a leadership role. You'll be responsible for reverse engineering solutions, defining availability improvement projects, and maintaining critical infrastructure services. This position offers the opportunity to work on complex technical challenges while ensuring system reliability for one of the world's leading enterprise software companies.

You'll be part of a specialist unit focused on availability and resilience, where you'll have the chance to influence architectural decisions, mentor team members, and drive innovation in system availability. The role combines technical leadership with hands-on development, requiring both strategic thinking and practical implementation skills.

This is an excellent opportunity for a seasoned engineer who is passionate about system reliability, enjoys solving complex distributed systems challenges, and wants to make a significant impact on a platform that powers businesses worldwide. The position offers competitive compensation and the chance to work with cutting-edge technologies in a collaborative environment.

Last updated 5 months ago

Responsibilities For Principal Site Reliability Engineer

  • Embed with delivery teams in a Lead capacity, focusing on corrective and proactive availability measures
  • Design, develop, debug, and operate resilient applications across distributed systems
  • Champion resiliency best practices including observability tool integration and auto-scaling
  • Develop Infrastructure-as-Code using Terraform
  • Build/integrate with APIs and microservices on containerization frameworks
  • Resolve complex technical issues and drive innovations for system availability
  • Participate in on-call rotation to address complex problems in real-time
  • Balance live runtime management, feature delivery, and retirement of technical debt

Requirements For Principal Site Reliability Engineer

Java
Python
Kubernetes
  • Related technical degree required (masters preferred)
  • 15+ years of hands-on software development experience
  • 5+ years in a Tech Lead, Principal or Architect capacity
  • Mastery of object oriented languages such as Java, Golang, APEX, Python
  • Deep experience with core web technologies: HTTP, JSON, REST, XML
  • Proficiency with databases including Oracle or other relational/NoSQL solutions
  • Experience owning and operating multiple instances of critical services
  • Subject matter expertise on Service ownership best practices
  • Thorough knowledge of Agile development methodology

Interested in this job?