I love this question! I haven't worked directly with running code gen agents, but I've done a ton of prompt engineering for other stuff like parsing unstructured data and generating insights
Here is a guide I suggest: https://www.promptingguide.ai/
I used this to get started.
Big things:
- Make sure to clearly give the bot all the info needed. This means clearly specifying here is the input API and here is the target API. Make sure to clearly specify to the bot to NOT make up any APIs -- like literally tell it to not hallucinate and it should only use the specified APIs. Obviously do some string checking to validate the response.
- Use structured outputs. I suggest using GPT4o-mini's structured outputs mode. It works fantastic in ensuring that you get the right inputs/outputs/structure every single time. GPT4o-mini is also really cheap :)
Small things:
- Give examples if possible. Examples help a lot
- Use XML tags to give a template.
- Lots of trial and error to be honest, make sure to have a way to systematically evaluate the "accuracy". This doesn't need to be automated but just a script to try a prompt and save the outputs to a file so that you can log and compare different prompts
Other details:
- You can do a surprising amount of stuff with smaller 7-B param models. As a rule of thumb first validate that it works with large models, then scale down as much as possible
- Complex reasoning/code gen: use large models
- semi-complex tasks: understanding complex unstructured data and making decisions (e.g. classifying food items so you give input: gummies and expect output vegan/vegetarian/non veg) use a medium model
- Paraphrasing/summarizing/text extraction: 7B param model
If your company allows it, you can look into using cursor for running code on your repo