I understand the idea and the purpose behind an atomic diff but what would that look like in practice if we were using Github? My understanding about this is sort of vague.
I'll lay out two different approaches that I've been thinking about along with their pros/cons. Please let me know if I'm thinking about this correctly.
EDIT: Upon using GPT, I found an extremely useful resource from google's documentation that actually outlines this problem very well, and resolved my question (links below):
How to do a code review | eng-practices (google.github.io)
Tl;dr -- it's better to have smaller and more focused PRs (approach 2) since smaller PRs are easier to fix compared to larger ones. If there are merge conflicts when rebasing, the impact on subsequent PRs could be mitigated due to their smaller sizes. Other additional advantages may include (1) faster feedback from peers -- esp. if you're wasting time solving the wrong problem(s) or needing additional feedback for approach, or (2) allow for concurrent engineering while some PR(s) can enter their review cycle. One feature can be broken up into multiple atomic changes (PRs).
With this, I would recommend creating a thread of PRs linked together describing all of their changes made during the development cycle. Each PR should serve their atomic purpose, and all of their changes combined should contribute towards fulfilling some story description (which gives additional context to the reviewer).
General principle: reduce the amount of changes you would need to make for each PR by making them smaller. Otherwise, it may take longer for each PR to be reviewed and gain acceptance for submission, which could cause a chain effect on subsequent PRs as well.
Approach 1 (1 big PR with atomic commits):
Say we have a feature branch that we're working on. We keep each commit atomic. We provide a summary description of our PR, a test plan section, and a commit (diff) section in our PR outlining each commit hash and a short summary for each change (in bullet point fashion).
The Files Changed section on github could be extremely long and contain 800-1000 lines of code although we could keep each commit atomic to control the complexity and make it somewhat easier to review with each commit containing 50-200 lines of code. Another pro is we keep 1 PR as opposed to having multiple prs but the con is changes might take longer to approve because of how many different commits there are to review...?
Meanwhile, there's approach 2 which is to keep several PRs (each 50-250 lines of code) and a master PR that aggregates all of the smaller PRs to outline a story of all the changes that are necessary for some feature that we're working on. The pros is smaller PRs are easier to review and easier to merge. The con is we have to pay the cost of rebasing subsequent branches especially if there are stacked dependencies from one pr to another.
I'm not really sure what the "best approach" would be. Any thoughts?
Yeah, your TLDR is pretty spot on.
The general idea for One PR/CL/Diff, One Thesis is that it should be smallest chunk that can be understood, tested, and rolled back. IMO smaller code reviews are designed to (1) lessen the amount of things the reviewer have to think about for each review (2) allow reviewers to spend less time per review, so they can squeeze time to review more frequently, leading to faster convergence on agreed solution. It's the more "empathetic" option.
Option 1 of having a single PR with multiple atomic commits fails both above goals (though probably better than having a giant PR with single or non-atomic commits). Rebasing can indeed be annoying, but lengthy code reviews are worse.