Blog Credit: Trupti Thakur
Image Courtesy: Google
The Critic GPT
CriticGPT is a powerful AI tool made using OpenAI’s GPT-4 model. It was made to make it easier for AI judges to find mistakes in ChatGPT code. One of the most important things this tool does to improve the accuracy and stability of code is to find bugs that human reviewers might miss.
Research and Development
A study paper called “LLM Critics Help Catch LLM Bugs” went into great detail about how CriticGPT was made. To improve the AI’s ability to find mistakes, researchers taught it with a dataset that only included purposefully wrong code. Because of this training, CriticGPT could find and report code errors more accurately. The study found that human annotators liked CriticGCOs given by CriticGPT more than notes made by human judges 63% of the time, especially when it came to finding mistakes related to LLM. This shows that the programming community is very open to AI-generated critical comments.
Innovations in Review Techniques
A new technique called “Force Sampling Beam Search” is used by CriticGPT to help human critics write better and more detailed reviews. This method also lowers the chance of “hallucinations,” which happen when AI makes or suggests mistakes that don’t exist or aren’t important. In CriticGPT, one of the most important benefits is that users can change how thoroughly errors are found. This gives you the freedom to find the right mix between finding real bugs and avoiding “error” flags that aren’t needed.
Limitations
CriticGPT has some problems, even though it has some good points. It mostly has trouble with long and complicated coding jobs because it was trained on ChatGPT responses that were pretty short. Another problem is that the AI doesn’t always find errors that are spread across multiple sections of code. This is a regular problem in software development. To sum up, CriticGPT is a big step forward in AI-assisted code review. It improves the code review process by mixing GPT-4’s features with advanced training and new methods. As with any tool, though, it has some flaws that make it less useful in more complicated code situations.
Blog By: Trupti Thakur