The interview process for these types of companies is fairly standard. It typically involves an initial screening, followed by two rounds of phone interviews, and finally an on-site interview. The on-site interview was swapped for a virtual interview due to COVID-19.
PROS:
CONS:
TIP:
Go through the second Google SRE book, specifically the workbook.
How do you make a variable in a shell script available after the script exits (assuming the shell script was sourced)?
How do you change the priority of a running process?
Coding test: Parse a (syslog) file to get various fields from the logs and message counts. Associate counts with the processes that logged them.
Describe how SSH works.
Describe how curl works. What happens when you call the command? Describe the process of loading libraries, parsing arguments, DNS resolution, etc.
You have gigabytes of data that needs to be periodically synced from a producer to a large number of consumers. How do you approach it? Hint: the data set isn't necessarily entirely new each time it needs to be synced, so only sync the data that has changed.
You take over a new service and discover it has no monitoring. What monitoring would you put in place within the first week to ensure the service is working? Within the first month? How do you monitor failures which are local to a region?
You will be asked to role-play a scenario where the number of registrations for a service has dropped to 0 for the past 6 or so hours, setting off an alert. You will have to go through an incident response and elevation. You will be asked to write simple reports that are suitable for giving high-level status to a manager.
You will be shown several architecture diagrams and asked various questions, like "what happens when database X goes down?", or "How to speed up requests from service Y?". Caching plays a big role in almost all responses.
You will be asked to do live troubleshooting of an Apache (httpd) web service. You will not be given many details by the recruiter, so it's easy to study the wrong thing here. It ended up that you need to be familiar with the httpd config file and Aliases. You need to be familiar with how to change Linux filesystem permissions, but you can ignore that you are running on RedHat and you won't need to touch SELinux permissions. Be careful of one problem where they will have two nearly-identical file names, except one has a hyphen and the other a Unicode dash character. They look very similar in many fonts. Make sure you know how to do a simple GDB backtrace. You will be asked to debug a segfault and work around it (via simple file rename).
You will have to perform a code review of several pieces of code. Focus on logic errors, not stylistic issues. I don't remember all the code samples, but one was about doing file backups, where they manually implemented extension parsing and copied over ".1" files to ".2", etc. without ensuring the order of the copy.
The following metrics were computed from 6 interview experiences for the LinkedIn Site Reliability Engineer role in Sunnyvale, California.
LinkedIn's interview process for their Site Reliability Engineer roles in Sunnyvale, California is fairly selective, failing a large portion of engineers who go through it.
Candidates reported having very good feelings for LinkedIn's Site Reliability Engineer interview process in Sunnyvale, California.