Principal Site Reliability & Availability Engineer

Salesforce is a leading customer relationship management (CRM) platform and cloud computing company.
$223,000 - $323,400
Site Reliability
Principal Software Engineer
Hybrid
5,000+ Employees
15+ years of experience
Enterprise SaaS · Cloud
This job posting may no longer be active. You may be interested in these related jobs instead:
Director of Engineering – Analytics SRE

Lead SRE team for Oracle Health Data Intelligence, overseeing analytics platforms and driving reliability best practices.

Director, Software Engineering, Site Reliability

Lead LinkedIn's Site Reliability Engineering team in Bengaluru, directing 40+ engineers to ensure reliability of critical infrastructure systems while driving innovation and operational excellence.

Principal AI Infrastructure SRE Engineer

Lead AI infrastructure transformation at NVIDIA as a Principal SRE Engineer, managing large-scale systems and implementing modern automation solutions.

Principal Site Reliability Developer

Principal Site Reliability Developer position at Oracle, focusing on cloud services and infrastructure with 10+ years experience required, based in Bengaluru, India.

Principal Site Reliability Developer

Principal Site Reliability Developer position at Oracle, focusing on cloud infrastructure, automation, and distributed systems architecture in Bengaluru.

Description For Principal Site Reliability & Availability Engineer

As a Principal Site Reliability & Availability Engineer at Salesforce, you'll be part of a specialist unit focused on availability and resilience. You'll embed with delivery teams, acting in a Lead capacity, creating bandwidth and prioritizing corrective and proactive availability measures. Your responsibilities include:

  • Designing, developing, debugging, and operating resilient applications and platforms deployed across distributed systems running on thousands of compute nodes in multiple data centers.
  • Championing resiliency best practices, including observability tool integration, horizontal/vertical sizing & auto-scaling, release rollback & recovery workflows, and integration tests.
  • Using and contributing to open source technology (e.g., Spinnaker, Zookeeper).
  • Developing and leveraging Infrastructure-as-Code using Terraform.
  • Building and integrating with APIs and microservices deployed on containerization frameworks such as Kubernetes, Docker, and Mesos.
  • Resolving complex technical issues and driving innovations to improve system availability, resilience, and performance.
  • Balancing live runtime management, feature delivery, and retirement of technical debt.
  • Participating in the team's on-call rotation to address complex problems in real-time and maintain high service availability.

Required skills include:

  • A related technical degree (master's preferred)
  • 15+ years of hands-on software development experience
  • 5+ years in a Tech Lead, Principal, or Architect capacity
  • Mastery of object-oriented languages like Java, Golang, APEX, or Python
  • Deep experience with core web technologies and databases
  • Expertise in service ownership best practices, SLO/I/A definition, and incident management

Join Salesforce to work on cutting-edge technology and contribute to the reliability and availability of systems used by millions of users worldwide.

Last updated 7 months ago

Responsibilities For Principal Site Reliability & Availability Engineer

  • Design, develop, debug, and operate resilient applications and platforms
  • Champion resiliency best practices
  • Use and contribute to open source technology
  • Develop Infrastructure-as-Code using Terraform
  • Build and integrate APIs and microservices
  • Resolve complex technical issues and drive innovations
  • Balance live runtime management, feature delivery, and technical debt retirement
  • Participate in on-call rotation

Requirements For Principal Site Reliability & Availability Engineer

Java
Python
Kubernetes
  • Related technical degree (master's preferred)
  • 15+ years of hands-on software development experience
  • 5+ years in a Tech Lead, Principal, or Architect capacity
  • Mastery of object-oriented languages (Java, Golang, APEX, Python)
  • Deep experience with core web technologies (HTTP, JSON, REST, XML)
  • Proficiency with databases (Oracle, relational, NoSQL)
  • Experience with critical infrastructure services
  • Subject matter expertise on Service ownership best practices
  • Thorough knowledge of Agile development methodology

Interested in this job?