Evaluating AI generating "good quality code" and developer productivity questions

Question

I watched Rahul's debugging course and Alex's side projects the other day and found both great when it comes to productivity, but also had more questions about good quality code.  How do we evaluate how much of AI is actually generating good quality code?  I've found not all of it is 'trash code' and is some stuff I would have written (boilerplate with customizations), and useful solutions -- but not all the time.   When it hits the point of hallucinating inventing modules and frameworks that don't exist anywhere in official documentation of open source frameworks and I know it doesn't know what it's doing (and that I do and I can write better code and I find that it's some other error in my logic, I know not to depend on it). But besides that, what else can we use as gold standard for good quality code? I watched Alex's course on code quality, but feel like I need a mini refresher when I think about using AI and this question. What is categorized as good for to use (boilerplate, admin fixes vs. substantial diffs)?  Some companies have banned the use of LLMs completely for various reasons.  For example,   #1) Safety - guardrails is an issue.   This is so that a company Amazon for example doesn't have rogue agents, I heard this from an Amazonian engineer who complained about lack of guardrails at the end of last year at an event on developer productivity I went to at a major tech conference last year -- this was before Amazon product rolled out recently.  #2) The other major reason of course is accuracy and hallucinations.   But both these reasons, while known, are distinct and the opposite for other major companies who have embraced it and cited it as increasing developer productivity.

Alex Chiou · Accepted Answer

As I talk about in my code quality course, the main thing that truly matters is the end-user experience: https://www.jointaro.com/course/level-up-your-code-quality-as-a-software-engineer/what-actually-makes-code-good/

If the code genuinely delivers a great user experience (and is easily scalable/extendable to continue doing so), it's objectively good code. Maybe not great code (that's code where both the code and end-user experience are very clean), but good. What I've found is that these 2 are deeply connected. If your code isn't following enough of the good patterns, it's near impossible for it to deliver a good end-user experience.

When I get code from AI, I always read it line-by-line to make sure it's not spaghetti. If it's mission critical code and in a space I deeply understand, I'll write it very carefully. If not, I'll read it more briefly. When I'm policing AI code, I always look for the common bad code patterns I cover in the code quality course: https://www.jointaro.com/course/level-up-your-code-quality-as-a-software-engineer/code-comment-caches/

I have found that bad naming and excessive comments are very common with AI code.

As a front-end engineer, I have also found that AI hardcodes UI dimensions a lot instead of using relative constraints.

Evaluating AI generating "good quality code" and developer productivity questions

Discussion

Other Great Discussions