The company I work for started tracking a wide range of metrics related to our day-to-day work (with an external tool called LinearB). It integrates with pretty much everything to collect as many metrics as possible such as lines of code, number of PRs, size of PRs, time spent on reviewing, cycle time, time spent in meetings, etc. These tools feel like they only aim to gather as many metrics as physically possible, but do not always manage to put them into context. For example if you go on holiday or sick leave, all your metrics go down (for obvious reasons).
Personally I feel some of these metrics are straight up toxic and I also see that many people in our company started feeling paranoid about this and feeling an urge to “game” the metrics so their numbers look good.
The reason for this is that initially we were told the metrics are only going to be used on a team level, but now we are getting strong signals that this is used on the individual level as input for things like determining promos, raises, bonuses, etc. I know that there are standards and best practices to follow (like having small, meaningful PRs), but using these metrics as a signal for perfomance feel stupid, because it depends so much on the type of work I do. One week I'm debugging a production incident and it may be resolved with a single line config change, the other week I'm writing tons of unit tests, etc.
We were told that this whole thing is pretty much industry standard and very common at big companies like FAANG. Is that really so? If yes, could you elaborate on how is it implemented and how do you deal with the stress associated with trying to maximize your metrics (which may not be a direct consequence of "getting the work done", so you have to do extra just to increase your metrics).
Really appreciate all you inputs. Thanks.
Unfortunately, yes. These metrics are very public and very important in calibrations at Meta, and I'm confident that they're also used across Amazon, Google, and the other Big Tech companies. fwiw, I actually think Meta toed the line here quite well. Pretty much every engineering manager made it clear that there's far more than metrics (peer feedback is more important), and packets were consumed with the proper context. I knew Staff Engineers who had massive impact and low diff count (as low as ~15/half) but still got good ratings (because again, they had the impact). And people didn't try inflating commit count with dumb diffs as you would be caught very fast (those diffs still need to be reviewed) and punished.
All that being said, this is an incredibly nuanced topic, so I just want to lead with some very important points:
I left more incredibly detailed thoughts here: "How does Stack Ranking work (at FAANG) and how can I be proactive at a base level?"
I recommend checking out the linked resources there as well.
When it comes to dealing with the stress, my advice is to forget about the metrics as much as possible and focus on maintaining a health dialog with your manager (and ideally your skip as well) about how you're trending. More details here: [Masterclass] How To Work Better With Your Engineering Manager
As a senior engineer, I would also expect a steady code output so try to average at least 1-2 commits landed per week to be safe as well.
I have unfortunately been in the situation where I had to give feedback to mentees to up their diff count. While it's true that Meta was able to handle the nuance, it's better to just avoid the situation where you need to defend a packet where they only landed 10 commits for the entire half. However, this was more for junior and mid-level engineers, so I hope it's not the same for you.
Out of the FAANG cultures I've personally experienced and heard/seen second hand, Meta seem to be the most aggressive company in terms of tracking engineering metrics. CL/Diff count tracking is nearly non-existent at Amazon and Microsoft from what I can tell. Google mostly tracks it as a proxy to inform managers of poor performers but explicitly tries to rule out bias for high cl output - we don't generally care about being top 5 in the org for CL output, for example. So while I wouldn't say it's industry standard, it's certainly common. My general thought is that the usage of these stats probably do more harm than good, but it is what it is.
Having said that, I think these stats can be useful personally.
It's meaningful to use these metrics to measure how you've progressed as a software engineer. While there are many ways to land impact as a SWE generally, 95% of the time, your work will result in some level of code. While not every company cares about code output in terms of diff count, every company cares about code quality, and good code quality inevitably leads to higher diff count. I've personally used the metrics to track my own productivity and code velocity, and they've helped me create a game for myself as I sought to improve as a programmer. So while these systems can be abused, I would advise you not abuse it so you at least have an accurate picture of your own work. Use it to help inform your own decisions around planning time for deep work, improving code quality, or whatever. If it's gonna tracked anyway, use it to actually help you.
Organizations should use low code output to flag potential problems.
I think the magnifying glass is definitely put on entry/mid level engineers in companies that do track the stat. The reasoning goes, as a entry/mid level engineer, you shouldn't have too many other responsibilities (glue work) as part of your expectations, so you should at least have time to output code. Glue work becomes part of the job expectation at higher levels so you can be "excused" from having to maintain code output. https://noidea.dog/glue for anyone curious. While I personally think people who do glue work regardless of their job expectations are awesome, I do think the performance system at big tech don't generally reward this, so it's at least a good idea to make sure you have cleanup projects to grind away on if you are blocked on a major project or something, for example.
Using code output to highlight high performers don't really make much sense, IMO. High performers generally will have very, very visible impact (maybe land multiple next level projects) that are far better justifications. High code output is just a byproduct of amazing delivery, at best it's just a pat in the back.
I feel anything beyond using code stats as a "smell" indicator is counter productive. Managers should always provide contextual evidence for good/bad performance. For example, had I encountered a performance packet that said the candidate wrote 3x the average number of CLs for the year and therefore deserved high ratings / promo, I would question the manager very hard to make sure these CLs actually delivered extraordinary value and that their technical influence was also next level.
As a senior engineer, you might be able to probe your manager/skip for how these metrics are actually used. Maybe bring this as a concern you see others have and hear what your manager actually says how they are used during reviews so you might be able to assuage others' fears. Maybe help your manager get better context of lower code output by highlighting other types of work that is being done, so they don't unfairly punish your teammates.