Taro Logo

Senior Site Reliability Engineer

Leading provider of customer relationship management (CRM) software, helping companies connect with customers through AI, Data, and CRM solutions.
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
Enterprise SaaS · AI

Description For Senior Site Reliability Engineer

Salesforce, the industry leader in CRM solutions, is seeking a Senior Site Reliability Engineer to join their platform team. This role is crucial in managing their multi-substrate Kubernetes and microservices platform that powers Core CRM and various applications across Salesforce. The position offers an exciting opportunity to work with cutting-edge technology and shape the future of cloud infrastructure.

The ideal candidate will be responsible for maintaining and improving a massive infrastructure consisting of 1000+ clusters running various technologies including Kubernetes, Docker, and service mesh. This role combines hands-on technical work with strategic thinking, as you'll be implementing AIOps automation, monitoring, and self-healing mechanisms to reduce MTTR and operational toil.

The position offers exposure to large-scale distributed systems and the chance to work with a highly innovative team of developers and architects. You'll be at the forefront of cloud-native and AI-driven operational practices, helping build a highly reliable, self-healing, and scalable service mesh infrastructure.

Working at Salesforce means joining a company that believes in business as a platform for change. The company maintains a strong commitment to equality and non-discrimination, ensuring a supportive and inclusive work environment. The role offers competitive compensation, comprehensive benefits, and the opportunity to work with some of the best minds in the industry.

This is an excellent opportunity for an experienced SRE professional who wants to make a significant impact while working with modern technologies and practices in a company that values innovation, equality, and professional growth.

Last updated 13 days ago

Responsibilities For Senior Site Reliability Engineer

  • Ensure high availability for microservices supporting service mesh and ingress gateway on 1000+ clusters
  • Contribute code to drive service availability improvement
  • Implement monitoring and metrics with Prometheus, Grafana and other frameworks
  • Drive automation efforts in Python/Golang/Puppet/Jenkins
  • Improve CI/CD pipelines built on Terraform, Spinnaker and Argo
  • Implement AIOps automation and self-healing mechanisms
  • Collaborate with various Infrastructure teams across Salesforce
  • Evaluate new technologies as needed

Requirements For Senior Site Reliability Engineer

Kubernetes
Go
Python
Redis
PostgreSQL
  • 3+ years of experience in SRE/Devops/Systems Engineering roles
  • Experience operating large scale Kubernetes cluster management systems
  • Strong working experience with Kubernetes, Docker, Container Orchestration, Service Mesh, Ingress Gateway
  • Good knowledge of network technologies (TCP/IP, DNS, TLS, HTTP proxies, Load Balancers)
  • Strong Experience in Observability tools like Prometheus, Grafana, Splunk, ElasticSearch
  • Strong working experience with Linux Systems Administration
  • Good experience in scripting/programming languages: Python, GoLang etc
  • Experience with AWS, Terraform, Spinnaker, ArgoCD
  • Excellent problem-solving, analytical and communication skills

Benefits For Senior Site Reliability Engineer

Medical Insurance
401k
  • Comprehensive benefits package available at salesforcebenefits.com

Interested in this job?

Jobs Related To Salesforce Senior Site Reliability Engineer