Taro Logo

How to remove yourself from being a bottleneck?

Profile picture
Anonymous User at Taro Community17 days ago

Due to unforeseen circumstances from past 6 - 8 months, I've been the Senior most engineer in my team, (I've a total of just 2.7~ YOE). My team consists of 12~ SDE 1s (New Hires) and 2 SDE2s (The other SDE2 being promoted very recently). My manager does a great job filling the role of Senior Engineer which reduces bit of pressure off of me.

However, due to necessity in the team I've ended up being SME in all the services owned by our team. This leads to everyone reaching out to me to help them with their queries, I try to document some of these and add in the Wikis so that it can be easily accessible for others next time. However, when it comes to certain tickets and issues, I end up having to pick that task up myself (Manager does not ask me to, but at same time i know that for someone else the ramp up time required to fix the issue would be too high).

I recently tried to reduce this (2~ months ago), this led to our overall ticket health getting worse and I had to again start looking into them myself and guiding each on-call cycle with right action items for the tickets etc.

This involves me helping them to do the following :-

  • Prioritise correct tickets to look into for the on-call cycle.
  • A potential fix for the ticket so that they know where to look into.

Due to which it ends up taking 6+ hours weekly to keep this running. I don't really mind doing this, however i don't feel this is a scalable solution and would eventually want to slowly scale down from doing this and have my team being able to be self-sufficient.

What's the best way to go about this without affecting my team's ticket health?

11 Likes
1.7K Views
3 Comments

Discussion

(3 comments)
  • Steve Huynh
    Principal Software Engineer at Amazon
    16 days ago

    I'm interpreting what your saying as you are effectively acting as a senior engineer by being the SME on your team's services by doing things like fielding queries from other teams, and this is leading to you having to choose between acting in this capacity or addressing your team's ticket queue.

    The simple answer is that a senior engineer is expected to do both at a high level. There are three paths forward

    1. Work extra time. This isn't sustainable in the long-run, will lead to burnout, and potentially will introduce performance problems for you down the line.
    2. Transition away from being the SME. I'm going to assume you want to be a senior so we'll rule this out.
    3. Get more efficient, deep dive on root causes, and drive long-term fixes. It seems that you've started doing this on the SME side by creating documentation and wikis. Realize that this type of work is long-term and pays dividends in the future. You should apply this thinking to your team's ticket queue. If your team gets the same class of ticket, spend your oncall time and spare cycles working on fixing the root causes. Make sure your ticket queue doesn't boil over, but prioritize fixing bad alarming, getting finer grained metrics, and code-level problems rather than only focusing on ticket-level resolution. This may require some extra time and effort in the short term but it won't be permanent. Eventually, like the documentation work you are doing, things should get better over time and there will be less on your plate.

    Realize that the way out of this is by taking a longer-term perspective and by making sure your day-to-day is chipping away at making yourself and your team's processes more efficient.

    Hope that helps.

    -Steve

    16 Likes
  • Kuan Peng
    Senior Software Engineer [L5] at Google
    15 days ago

    Building off what Steve wrote (drive long term fixes), perhaps you don't have to solve this problem entirely on your own either, but take a directive role instead.

    You can work with your manager with come up a list of consistently recurring issues. You manager should really want to solve this issue as well, because they can't afford to have single-point-of-failure in the team. You may ask questions to yourself and your manager to figure out what is the main contributor and address it systematically.

    For example:

    1. Deployments of new code frequently cause outages => Are they catchable by tests? if so, maybe help focus the team on adding tests to their code changes. Is it more that the systems underneath are not decomposed well enough and thus are dependent on internal workings of each other?
    2. Lots of false alarms => Are you alerting on the correct metrics? How can you tune your monitoring solution
    3. Oncall engineers don't know how triage issues => have everyone been sufficiently trained? Is there an oncall playbook? Can you have someone shadow you every time?

    Once you have a list of potential deeper issues and possible solutions, prioritize them. You can then either work on the priority solutions yourself, or delegate to a fellow engineer (maybe one of the recently promoted folks?) and you can guide their work.

    8 Likes
  • Brad Messer
    Senior Software Engineer at IBM
    15 days ago

    I've had this myself too to a point I haven't really gotten to do much coding in the last few months. My job though is to make the team better and to take on the grunt work when necessary to keep everyone helping at a high rate. In my case, people around me are working ~70hr weeks and I'm working ~55 max with a lot of my days finishing up midday really because there is no work to do. I scale through others by asking them to do the hard work I don't know how to do and then I jump in to ease up the pressure on my juniors by spending quality time in the trenches. Once one trench gets cleaned up, then we as a team can be re-deployed to help with other trenches. As stated before, document a lot and keep a long term focus. Focus on teaching them as much as possible and encouraging them to help each other grow, otherwise the situation becomes untenable very quickly and you start getting turnover.

    3 Likes