My code caused a recent incident, and I’ve just finished writing the postmortem. I’ll be presenting it to leadership for the first time and would really appreciate any advice on how to approach it effectively.
To give a bit of context:
The API appeared healthy during the incident—latency, error rates, and general performance all looked normal.
However, a boolean field in the API payload, implicitly relied on by the mobile app(IOS) not Android, was accidentally removed.
During the code review, I tested on platform only and it was working okay and didn't test the other one.
This dependency wasn’t documented in the codebase, and the field was being added by Component B, which enriches data from Component A.
My changes were in Component A. Since Component B mocks Component A’s data in its unit tests, the missing field wasn’t caught in CI or during local testing.
I fully acknowledge the mistake, and I want to be transparent and constructive in the review.
If you’ve got any tips—especially on how to communicate technical nuance and accountability clearly to leadership—I’d be grateful.
Thanks in advance!
Here's the SEV review template used at Meta, DERP:
I took this from this blog post. I also talk about this in the debugging course in the video titled Understand The Blast Radius.
If you proactively come up with comms in the DERP format and talk about the learnings, you'll be in a good position! Good luck.