This is a really good question, and as usual for Taro discussions, I have a lot of thoughts. Because of this, I'm splitting my response into 3 parts:
I actually tested AI just for this question. I took a critical view controller (300 LOC) from my Random Name Picker app (1,000,000+ users) and fed it to ChatGPT with the prompt of "Describe what the following code does and how it's structured at a high-level": https://github.com/Gear61/Random-Name-Picker/blob/master/app/src/main/java/com/randomappsinc/studentpicker/choosing/NameChoosingActivity.java
Here's what it gave me:
- List Management: Handles the selection of names from a provided list.
- User Interaction: Allows users to choose names, configure settings, view history, reset lists, and more.
- Imports: Import necessary Android libraries and custom classes for functionality and UI.
- Class Declaration: Declares the activity class, extending
- Initialization: Initializes various UI elements, adapters, managers, and data sources within
- UI Setup: Sets up the UI elements, populates lists, applies dividers, and handles visibility based on data.
- User Interactions: Defines click listeners (
OnClick), handling actions like name choosing, displaying images, managing settings, etc.
- Callbacks & Overrides: Implements various Android lifecycle callbacks (
onResume, etc.) and custom interfaces (
Listenerinterfaces) for handling name selection, TTS (Text-to-Speech), clipboard actions, shake detection, etc.
- Menu Handling: Manages options menu creation (
onCreateOptionsMenu) and item selection (
onOptionsItemSelected) for settings, list editing, resetting, and other actions.
- Initialization: Loads necessary components and sets up the UI.
- User Interaction: Handles user actions such as choosing names, displaying images, speaking names aloud, copying to clipboard, and responding to shake events.
- Settings Handling: Allows users to configure settings via dialogs and apply/revert those settings.
- Data Management: Saves state changes (e.g., onPause) and manages list updates or resets based on user interactions.
- Navigation: Handles navigation back, launching edit activities, and displaying historical data.
This is a completely garbage description of what my code does and how it works. It is primarily just reading off the code to me in a fancy way. It's downright embarrassing.
For example, look at this line: "Class Declaration: Declares the activity class, extending
AppCompatActivity." - This is literally how the first line of the class reads. This is Java (not Kotlin) where you explicitly have to write "extends". Human resources of electricity and compute were used to generate this complete waste of text. What a shame...
Here's how I would describe this class to a new engineer onboarding into my codebase:
NameChoosingActivity is the view controller for the actual selection of names. If you want to understand how this core functionality from the app works, start from this class. It uses the delegate pattern (i.e. interfaces) to modularize functionality into separate classes and prevent itself from becoming a god class. You can click into the layout XML to understand how the view is drawn. You can click into the interfaces it implements to understand how different parts of the underlying business logic work (use the class names to identify what each piece does).
My explanation adds an unseen layer of value on top of the code to actually explain what it does in terms that a junior Android engineer can understand. It also provides actionable steps on how they can further their understanding past this central class. This is what engineers want when they say they want to understand a codebase.
The reason for this is simple: AI cannot truly understand anything, at least not in its current state. How AI works is that it's trained on a mountain of data to the point where it gets extremely good at pattern recognition across a broad range of topics.
Here's a break down of why AI falls apart in your use-case:
This is why AI is great for LeetCode problems and basic L3-level code that has 25+ quality StackOverflow threads on it. It's just stitching that information together and sometimes filling in small gaps. Again, it's the expert pattern recognizer and the code it gives you is almost always 95%+ identical to something (or some combination of things) that already exists online.
However, if you look at the code of my view controller and the explanation I generated, the strings couldn't be any more different. There is pretty much 0 overlap. My explanation is purely a creative endeavor. And this is why AI can't do what I just did here.
Ask your teammates! As I have extensively covered in my previous comments, broader-level codebase understanding simply can't be done by the robots. It's good for small, self-contained code like LeetCode algorithms as you mentioned, but there's no way AI can make the inferences and logical leaps necessary to delve into surrounding components and synthesize it all together.
Whenever I start on a new team or am put on a new codebase, I'll find the tech leads and other major engineers behind it and schedule a 1 on 1. In this meetings, I literally ask: "Can you give me an ELI5 explanation of how your code works and how to best work with it? What are the major components I should look into first?"
There's a lot of meta-benefits of putting yourself out there as well instead of trying to understand the code completely in solitude with AI:
You should also tinker aggressively with the codebase which we cover in-depth in the following discussion (check out the linked resources there too): "What is the effective way to understand new repository in order to make the required changes in that repo?"
In my opinion, AI is great for getting 60% of the understanding in half the time. It is not a magic bullet that can answer all questions, but it definitively saves me time to give me a high level understanding of code, especially if the codebase is brand new
I get the best results when using GPT4, adding to the prompt things like "what is the purpose" or "please provide a high level takeaway"
Another note is there is a really really huge difference between gpt4 and gpt3.5 in my opinion
I view AI as a tool to accelerate your progress rather than a substitute for your work.
Using tools like github copilot and cursor.sh, it's worked really great for me when I need to jump into a code base I've never seen before and need to figure out where a certain logic is. These tools use something called semantic search which scans your entire codebase for the most relevant parts. I've used it before when I had an ML codebase with 20 ish files for a GAN model (one that I've never trained before) and had to figure out how to update a certain parameter, what a certain param does. Or to see what transforms are being applied to the images in the training pipeline
I will say though most AI coding tools only work well for python and in smaller projects
If you want to use generative AI (like ChatGPT) to understand a code base it is important to remember these two principles which are essential to getting the most out of a tool like this:
Here is my attempt at forming a 'better' ChatGPT prompt for the same code that Alex tested with. I think these results are much more helpful.