Taro Logo

What is the effective way to understand new repository in order to make the required changes in that repo?

Profile picture
Software Engineer at Microsofta month ago

Every time, when my manager asked me to do some changes to the repository that is totally new to me. I became scared.

I prefer to do research by myself first. But I got lost in the new repo by reading file by file, and don't get the clarity.

So I ask the repository owner to provide documentation, mostly they don't maintain documentation, and even if they do, it is not updated or it involves a lot of detailed feature-wise documentation, which is usually not relevant to my requirement.

Then, I call the POC of that repo, but I couldn't figure out what is the right question to ask in the first call. Over time, I ping him asking questions whenever I face hurdles while achieving the requirements.

Sometimes, I put a debugger or logs to understand the flow of code.

The above processes took a lot of my time.

What is your suggestion to get clarity in the new repo such that I can complete my requirements in less time?

74 Likes
16.2K Views
2 Comments

Discussion

(2 comments)
  • Philip Su
    Ex-Microsoft, Ex-Meta, Ex-CEO of tech nonprofit
    25 days ago

    This is a super-common experience, so I'm glad you brought it up. I worked in six different teams at Microsoft, and then on many teams at Meta (where the challenge was not only the codebase, but the fact that I had never done Javascript or PHP).

    My main advice on this feeling is to:

    1. get more comfortable with working under ambiguous circumstances, and
    2. focus on the zen of learning how to understand a given codebase

    Obligatory Ambiguity

    When I joined Windows in 2000, it was already perhaps 50M lines of code. In such a world, it's impossible to understand even 5% of the codebase. At the time, even understanding just the DLLs would have been too much to ask. Most large corporations you join will have legacy codebases that are large (perhaps not as large as Windows, but large).

    So here's the deal: most engineers in such codebases are operating with very limited knowledge of the actual codebase. What becomes important then is to be able to be comfortable working with limited knowledge, and to also know what things are most important to know.

    Comfort at working with limited knowledge depends on a few things. The existence of thorough tests makes things easier. The factoring of code to have loose coupling / strong cohesion also helps. The discoverability of all code (e.g. monorepo) is huge. But personality is also part of this. Some people can't go on vacation unless every day is already booked and all activities are accounted for. Others arrive at some location and just free-range. The key mindset, in a large codebase, is to embrace that you'll never know even a significant portion of it. The skill becomes how to operate under such circumstances.

    How to Understand

    • Learn which questions to ask. Diving into new codebases is like sight-reading sheet music -- the more you do it, the more your mind develops mental models of where the key questions and pitfalls are. Seek opportunities to dive into new code, unlike people who avoid going into dark corners because they're scared to. The latter sort of person stays scared for life.
    • Fix lots of bugs. Bug-fixing hones your discovery skills: what code calls what, which tools make it easier, etc. Bug fixing also allows you to interact with far more of a codebase than if you write new code in it. It's the fastest way to cover a lot of ground.
    • Don't look for documentation. Unless you're working on a miraculous team (which btw wastes a ton of time keeping documentation updated instead of writing code that impacts customers), you should assume documentation is outdated or non-existent. Code is like Shakira's hips: it never lies.
    • Meta learn. Learn how to learn. You'll develop strategies... like if you're confused about a component, look through git history to figure out what changes last touched it. Then look at those commits, which'll tell you exactly which other components interact with the current component. Using source code history is just one meta-learning; there are many other ways. The key is to practice (i.e. dive into new code a lot), and to be observant about which behaviors of yours lead to faster/better outcomes.
    96 Likes
  • Alex Chiou
    Tech Lead @ Robinhood, Meta, Course Hero
    24 days ago

    Great question! There's a lot of great details here that I really appreciate you sharing, so I'll go through them one-by-one.

    Every time, when my manager asked me to do some changes to the repository that is totally new to me. I became scared.

    Flip the mentality - Don't be scared, be excited! Fear is one of the biggest obstacles holding back software engineers. When you are afraid, that fear infects everything you do: In particular, it prevents you from being bold.

    Now here's the thing: Learning a crazy, new codebase is all about being bold. You need to be bold making seemingly stupid changes to break the code in a super obvious way to maximize your learning. You need to be bold asking your teammates for help, maybe even pair programming with them.

    Every time you pick up a new codebase, you are both learning the tactics behind it alongside building up your "meta" learning muscle as Philip described. That should make you excited!

    I prefer to do research by myself first. But I got lost in the new repo by reading file by file, and don't get the clarity.

    Reading the code, especially file-by-file, is one of the easiest ways to throw your time into a black hole. I recommend these other tactics:

    • Read the blames - Blame the overall module and see which files have the most recent changes. From there, you can pick the files that matter the most. We cover this tactic more in this video here: Learning A New Codebase? Here's How To Figure Out What Matters
    • Ask your colleagues for a high-level overview - This is a sort of "leveled up" version of the previous tactic that can even go alongside it. Find a core POC for this other repo (you can use recent blame volume to figure this out) and put a meeting on their calendar to discuss:
      • What the most important classes are in the repo
      • How different components talk to each other and the overall end-to-end flow
      • Your goals within this repo

    So I ask the repository owner to provide documentation, mostly they don't maintain documentation, and even if they do, it is not updated or it involves a lot of detailed feature-wise documentation, which is usually not relevant to my requirement.

    This is another classic trap I've seen engineers fall into, hehe. The documentation will never be good enough. This is going to be especially true in a top-shelf, massive company like Microsoft. It's simply too hard to keep the documentation up-to-date, and engineers generally aren't rewarded enough in their performance review to do so.

    Instead of reading the documentation, you should fall back onto more "active" tactics like the ones I described:

    • Changing the code and breaking stuff
    • Talking to people
    • Tactically going through the blames to find hotspots

    Then, I call the POC of that repo, but I couldn't figure out what is the right question to ask in the first call.

    Honestly, a good first question is something like: "Hey, I'm completely new to your codebase, and I need to make the following changes to do it to accomplish [MY_TASK]. I really want to make sure I uphold the integrity of your system and do it properly, so can you give me a high-level beginner's overview of how it all works and best practices?"

    Just show that you're extremely motivated to be a good citizen within their ecosystem. And after you receive their help, make sure to give them deep thanks.

    Over time, I ping him asking questions whenever I face hurdles while achieving the requirements.

    I could see this getting frustrating for this other person, so I highly recommend batching questions together as much as you can. To help with this, you should break down your task into bite-sized chunks through decomposition. From there, you can proactively think through each piece and come up with a bunch of questions at once. Here's a good discussion around decomposition: "How do I make software less overwhelming?"

    Sometimes, I put a debugger or logs to understand the flow of code.

    Both are great! The debugger feels more "refined", but I actually like logs a lot more. From my personal experience, the debugger can hang in really large codebases like those of Meta and Microsoft. If this is the case for you, there's nothing wrong with adding a bunch of logs. When it comes to logging, make sure to overlog to minimize the amount of builds you need to do.

    Lastly, I highly recommend this other discussion around learning new codebases and becoming more independent within them: "How can I become more independent and better at unblocking myself with tricky technical issues?"

    48 Likes