AI Code Reviewer: Enhancing Code Review Efficiency with ChatGPT

Code review is a crucial part of the software development process. It helps maintain code quality, find potential bugs, and ensures adherence to coding standards. However, code review can be time-consuming, especially for senior developers who are often required to provide general-purpose comments on various aspects of the code. Code reviews like most things in this world, can be seen as a tradeoff. On one hand, the aim is to decrease the chance of bugs being smuggled into the codebase, making sure the code adheres to standards and also have the other pair of eyes look at the code created, to make sure it’s not missing the mark. On the other hand, code reviews are time-consuming, can stall development if not done efficiently, and are often costly for organizations (it is usually the senior developers that give more precious code review comments, and the more they spend on reviewing the less time they spent elsewhere). There is a clear incentive for all the parties involved to increase the efficiency of Code Reviews and decrease the negative impact they have on the IT organization.

To alleviate this burden and make the code review process more efficient, we propose AI Code Reviewer.

The Reality of Code Reviews

Code Reviews serve multiple purposes. In the industry, we like to say that it increases the security of the code merged, but there is no real measure to prove that. However, code reviews are full of suggestions regarding code style and structure. Things like “extract this to a variable/method”, “change this name to be more descriptive”, or “log a message here”. Let’s say, that 80% of the comments are about things that are common, easy to spot, and regarding general knowledge, and 20% are real to-the-point comments, questioning logic and possible runtime issues. Then, if we can tackle the first group, we can substantially cut the needed time to review a given piece of code.

Enter AI Code Reviewer.

How the AI Code Reviewer Works

Since Large Language Models were trained on swaths of code and internet pages, we can leverage them to apply the general “how to code” knowledge during Code Review. Then, we allow ourselves to focus on the logic and runtime considerations, knowing AI had already dealt with “the usual”.

The idea is straightforward. Take the file changes from the Merge Request, send them to OpenAI API, with proper prompts, and receive the review comments. Since, at Astrafy, we use Gitlab, we wrapped the script in Gitlab CI and it is triggered during Merge Request events, such as opening one, or updating one with additional commits.

ChatGPT Considerations

We can use different roles when dealing with OpenAI API. We use system prompt to explain how we want ChatGPT to behave. There we say all about its role as the code reviewer, in which format we will send the diff, and what are we expecting as its output. Then as the user prompt, we send the actual diffs marked with file paths. We also decided to use GPT 3.5 in order not to drain GPT 4 messages too fast.

message = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "system",
            "content": "You are a code reviewer on a Merge Request on Gitlab. Your responsibility is to review "
                       "the provided code and offer"
                       "recommendations for enhancement. Identify any problematic code snippets, "
                       "highlight potential issues, and evaluate the overall quality of the code you review. "
                       "You will be given input in the format PATH: <path of the file changed>; DIFF: <diff>. "
                       "In diffs, plus signs (+) will mean the line has been added and minus signs (-) will "
                       "mean that the line has been removed. Lines will be separated by \\n."
        },
        {
            "role": "user",
            "content": user_message
        }
    ],
)

JSON payload for the OpenAI API

Lessons learned

We found that satisfying AI Code Reviewer is impossible. It will generate remarks as long as there are any changes. Learn to sift through its remarks to find the valuable ones, instead of blindly following them.
It makes it easier to separate the code review into two parts. In the first part, you can use the AI Code Reviewer in order to find out where you can improve on the syntax and structure. After that, in the second part, ask a human to review the code, disregarding the comments from AI. This way you’re not caught in back and forth with AI and colleagues to add all the changes.
Remember that using this tool equals sending your code diffs to OpenAI, which is not everywhere encouraged if not outright forbidden. Make sure you can use it, before employing it full scale in your organization.
Large diffs are problematic. ChatGPT has a token limit, which makes it impossible to send arbitrarily large code review changes. You can modify the code to split it into multiple requests, however, when done so, you will receive multiple separate reviews, instead of one. In large code reviews, context is everything. The ability to follow the changes from one place to another is crucial. If you automatically split the diffs based on arbitrary numbers, you run into the risk of having incoherent reviews.
Super small reviews are also problematic. ChatGPT will attempt to generate, for example, 8 distinct comments based on your 3 lines changed. You might want to introduce the lower limit of lines changed that qualifies for AI Review.

AI Review Code

The code is simple and extensible. It boils down to three steps:

Prepare a message with lines changed, that ChatGPT should review
Send the message as well as instructions what to do with it as prompts
Add the response as the review comment.

For full example see a public repository with this hooked up on every Merge Request here. Below is the python script with majority of the logic.

from typing import List, Any import gitlab import os from itertools import dropwhile import openai from dataclasses import dataclass import logging logging.basicConfig(encoding='utf-8', level=logging.INFO) @dataclass class Diff: path: str diff: str gl = gitlab.Gitlab(private_token=os.environ["PAT"]) openai.api_key = os.environ["OPENAI_API_KEY"] def main(): diffs, mr = get_diffs_from_mr() response = get_review(diffs) logging.info(response) mr.discussions.create({'body': response}) def get_review(diffs): user_message_line = ["Review the following code:"] for d in diffs: user_message_line.append(f"PATH: {d.path}; DIFF: {d.diff}") user_message = "\n".join(user_message_line) message = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[ { "role": "system", "content": "You are a code reviewer on a Merge Request on Gitlab. Your responsibility is to review " "the provided code and offer" "recommendations for enhancement. Identify any problematic code snippets, " "highlight potential issues, and evaluate the overall quality of the code you review. " "You will be given input in the format PATH: <path of the file changed>; DIFF: <diff>. " "In diffs, plus signs (+) will mean the line has been added and minus signs (-) will " "mean that the line has been removed. Lines will be separated by \\n." }, { "role"

AI Code Reviewer script

Conclusion

AI Reviews will slowly become a thing in the industry. Anything that improves efficiency and cut costs will. It’s worth noting that it will not replace human code reviewers but instead supercharge code reviews with AI guidance. While code reviews are sometimes bypassed or inefficiently reviewed, AI code reviewer will solve that by guiding, preparing and pointing out potential wrong practices in the code. In the end, it just adds one fully automated step in a merge request: