Taro Logo

Lead Site Reliability Engineer

Global leader in CRM software providing cloud-based solutions for sales, service, marketing, and more.
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
7+ years of experience
Enterprise SaaS

Description For Lead Site Reliability Engineer

Salesforce is seeking a Lead Site Reliability Engineer to join their SRE team, combining software and systems engineering to build and maintain large-scale, distributed systems. This role focuses on ensuring Salesforce services maintain reliability, capacity, and performance at scale. The position involves managing complex challenges unique to Salesforce's infrastructure while utilizing expertise in coding, algorithms, and system design. The SRE team emphasizes a culture of diversity, intellectual curiosity, and problem-solving in a blame-free environment.

The ideal candidate will work on enabling service owners to operate their services safely at scale, whether through observability frameworks, system optimization, or implementing AI/ML solutions. This role requires deep technical expertise in distributed systems, cloud infrastructure, and modern DevOps practices. You'll be responsible for maintaining and improving the reliability of Salesforce's critical services, implementing automation, and driving engineering excellence across teams.

The position offers the opportunity to work with cutting-edge technologies and solve complex problems at a massive scale. You'll collaborate with various engineering teams, lead incident responses, and drive improvements in system reliability and performance. The role combines hands-on technical work with leadership responsibilities, making it ideal for experienced engineers looking to make a significant impact in a leading enterprise software company.

Working at Salesforce means joining a company that values innovation, customer success, and giving back to the community. The company offers a collaborative environment where you can grow your career while working on technology that impacts millions of users worldwide. If you're passionate about reliability engineering, automation, and building resilient systems at scale, this role provides an excellent opportunity to work with some of the best minds in the industry.

Last updated a day ago

Responsibilities For Lead Site Reliability Engineer

  • Support and scale multi-cloud, multi-region services
  • Build automation and self-healing capabilities
  • Operate and scale monitoring, alerting, and tracing systems
  • Improve CI/CD practices
  • Define and implement SLIs/SLOs with engineering teams
  • Collaborate on integrating AI-driven automation and observability
  • Work within Agile teams
  • Lead post incident analysis and postmortems
  • Use data to uncover trends and drive platform improvements

Requirements For Lead Site Reliability Engineer

Go
Java
Python
Kubernetes
Linux
  • 7+ years of experience in Python, Go, or Java for automation, tooling, and integration
  • Experience designing, building and operating large scale distributed systems
  • Experience in developing and deploying production-grade software applications
  • Strong understanding of software engineering best practices
  • Knowledge of Internet technologies and protocols (TCP/IP, DNS, HTTP, SSL)
  • Experience with API fundamentals (SOAP, REST)
  • Experience in Public Cloud environments, Kubernetes and container orchestration
  • Knowledge of microservices, service mesh, and zero-trust infrastructure
  • Strong Linux systems knowledge and troubleshooting skills
  • Experience in fault modeling, chaos engineering, and load testing

Interested in this job?

Jobs Related To Salesforce Lead Site Reliability Engineer