For your context: my manager proposed to change the On-Call Schedule and want to expand the size of the on-call team since our team is currently have 3 engineering with A rotating on-call schedule is currently done in a weekly basis through Pagerduty.
The goal is to increase the number of engineering and increase the coverage so that engineer only have to do on-call support one week in every 2 month. The company's engineering team is distributed across North America, South America, and few QA engineers in India. My manager asked me to see if it makes sense to have an on-call for production platform support team using Follow-The-Sun model:
A follow-the-sun schedule does just as the name suggests: arranges on-call team members based on where they work. This type of arrangement is perfect for remote teams with members located across wide geographic areas.
One employee or another is available for emergencies during the day (i.e., their regular work hours) and your business doesn’t have to schedule a night shift.
Any thoughts on this proposal of Follow-The-Sun?
I like this idea in theory, but it may not work in this case in practice based on where your team is located.
US + South America: UTC-8 through UTC-3, so 6 timezones.
Then there are “a few QA engineers” in India, UTC+5.5.
To follow the Sun in terms of not being woken up at night, but more commonly to only be on-call during working hours, you need much broader distribution of personnel or people with wildly different sleep schedules.
Just to demonstrate:
We don’t want anyone on-call before 8a and past 8p local time. I think 3 8 hour shifts is actually correct, but I’m being the most generous. If pacific time starts at 8a, this is noon UTC, which we’ll use to measure our coverage. If eastern Brazilians then work until 8p local, this is 23:00 UTC. We have covered 11 out of 24 hours. If India picked up at 23:00 UTC, that’s 4:30a local time. Even if we said their shift was 4:30a-4:30p local time, we’d still have a coverage gap for an hour, and we’ve made the “few” engineers in India cover really sad on-call hours every day, while the shifts of those in the americas, because presumably there are many more people, are far less frequent. Yes, you can start west coast at 7a and have full coverage, etc. but again… your distribution doesn’t work. Pacific, GMT, China are spaced 8 hours apart, so if your team was evenly spread to those three places this would be much more tenable.
maybe slim down goals and see?
The goal from my manager is to get at least 4 people, if not more, on on-call rotation. The number we have in mind would require an engineer doing support for a two week period (using Follow-The-Sun) and not repeat support for over 3-6 month.
I don’t follow how the math is working. To have 12 hour follow the Sun shifts, you need two people on call. To have two week shifts with 12-24 weeks off, you’d need between 14 and 26 people evenly divided on opposite sides of the earth. I don’t know how 4 people (do you mean adding 4 to an already large rotation) gets anywhere close to this.