Taro Logo

Avoid the Site Reliability team

Site Reliability Engineer
Former Employee
Worked at Salesforce for 4 years
April 8, 2015
1.0
Doesn't RecommendNeutral OutlookNo CEO Opinion
Pros
  • Decent salary
  • Health insurance
  • Nice equipment
  • Good outings
Cons

These cons are related to the Site Reliability structure. Some other structures may have similar issues.

To begin with, there is actually no SRE job per se; it's actually a NOC position. Originally, Salesforce had a NOC to handle incidents, etc. But under new management, the goal of the team changed. The new goal was to automate their way out of trouble. Unfortunately, this got as far as changing the name of the team to Site Reliability but not much else. When I interviewed (a lengthy process indeed), I was under the impression that the majority of the work would be engineering and analysis with firefighting and root cause analysis thrown in. This piqued my interest, but when I actually joined, the actual job was very different.

Picture this: You are one of a team of 4 or 5 "SRE" (I emphasize the quotes), and each of you handles a different role each day (10-hour days x 4 days a week... weekend work also, which sucks the life out of you).

The first role, which is the worst, is the console operator. You are made to watch a console with many different alerts coming in from thousands of systems. It's your job to watch this for 10 hours, click and confirm each alert (hundreds in a day), and create tickets and escalate as necessary. If you miss an alert which escalates to an issue, be prepared to be chewed out by your manager and anyone else who wants to blame you. Never mind the fact that you were staring at several hundred alerts for the past 5 hours and just slightly tired!

The second role does the same as above but stares at email and the chatrooms for any engineers who want to execute a change (SRE don't execute changes; they're the gatekeepers). Super boring!

The third, fourth, and fifth roles handle escalations from the others, although if it involves networks or databases, they go straight to different teams. The most complex issue you would probably investigate is a server falling over. There is little time for actual software engineering, as much of the time you will be closing the many, many, many, many silly tickets that were opened. If you do get to carry out some scripting, be prepared to get pulled off to cover for incidents (of which there are plenty), time off, or go through hoops to get your change implemented. If you get to do this role, then you will probably spend most of your time looking at job sites.

Plus, Salesforce is so big, someone is probably working on your idea. Want to fix monitoring? There's a team for that (M&M). Want to build automation? There's a team for that (DCA). Want to build tools for your team? There's a team for that (SRE Tools). All that is left is incident management, which itself deserves an entire post.

I realized my mistake in joining the team after 2 months. It took me a long time to get out, as I had lost valuable skills while there. If you want to join Salesforce, then I would recommend the Operations Engineering division.

Advice to Management

Hire a NOC to do NOC work. Hire an SRE to do SRE work.

Really figure out what you want the SRE team to do and what SRE actually means. The amount of quality staff that have left because of this mess is phenomenal.

Was this helpful?

Salesforce Interview Experiences