It's very easy to build a basic code review tool.
It's hard to build one that developers won't ask you to turn off because of false positives (or one that will miss your next escaped bug)
I think if all the tool does is run a claude code level /review skill (which all developers should definitely run before they even open a PR) then isn't this a bit of a review theater? Just a guardrail to those developers who don't run a /review-triage-fix skill in /loop before they take the PR out of draft?
I wonder how many PRs in the world got to production where several developers commented on each other's code, and none of them read anything, just used their gh cli / MCP to post / answer comments / fix issues on their behalf.
There is going to be an exponential growth of code generated, and you can't escape AI code review, but also there is no real difference between having Claude Code write the code and review itself locally, vs communicating with itself via a slow and downtime prone medium of "PR comments"
tl;dr - without any human in the loop reviewing the AI code review, or skimming to see what the AI code review missed, there is no real reason to use a "code review" you can just run it as part of the CI/CD and hope AI won't miss anything (according to my linkedin feed, there are people out there who really thing this way...)
Yes! Where it gets really interesting is the scenario in which every developer has their own unique review skill/workflow, so the reviews end up being different than you running it yourself, but nobody is reading them still.
> Can't you simply ask codex in another tab to just do a code review?
You are likely to get better results if you do not use the same model for review that wrote the code. I typically use Opus for code editing and GPT 5.5 for peer review using an automation with skills.
Training set is different between models. If there are gaps in coverage in one model, you want a different model reviewing the work. The second model will its own gaps, but the gap list is not identical.
Developers should definitely use whatever tool they use to review the code they (or the tool) just wrote. We have a skill that does this in a loop - spin subagents, review (based on our coding standards), triage the review in another subagent, fix what's applicable, push back on what's not, and we run this in a loop. This is before you even open a PR.
The idea of a PR is for others to find things that you have a blind spot to, and also leave some paper trail on the thought process. E.g. if something was not fixed, there is a history of a comment and a reason on WHY it wasn't fixed. If you do all that only locally, that context is lost.
We noticed that even after doing this self review loop multiple times, we still find issues (either via other models / tools or via humans that have the "tribal knowledge")
Maybe one day AI will write perfect code and can review itself, but even if it's 0.1% chance it has a bug, or 1 in a million it will do something a bit sinister (like open a backdoor just in case you try to shut it down) - then I really think there is always going to be a need for humans to review something.
I recently moved off Cursor's BugBot because it's no longer a flat $40, and I feel a little lost trying to find a viable alternative because there are so many and the pricing kind of sucks for all of them. Curious if anyone has a recommendation.
My team tried coderabbit and qodo and they are both trash compared to a tool we quickly built in-house that is more or less a thin wrapper around claude/codex, along with per-repo skills. PR review is triggered by webhooks from github to the review tool's web app. The tool shared by OP from alibaba certainly does some things ours does not and appears more sophisticated, but we have never had the problems they mention.
"The agent can read full file contents, search the codebase, inspect other changed files for context, and produce deep reviews — not just surface-level diff feedback." our tool does all this too. It catches dumb typos as well as more complicated bugs. Not to mention it is great as a ratchet (https://qntm.org/ratchet). It is not a substitute for reviews from other engineers though, since obviously it does nothing to achieve one of the main goals of code review, which is to socialize knowledge of the codebase.
Alibaba's work here is almost certainly more advanced than what we've done, but ours has been perfectly satisfactory and better than the paid offerings we've tried. I think most teams should not be paying SaaS fees for AI code review, that is the kind of business that mostly should not exist any more.
We've been using Coderabbit, great deal ($30/mo/dev flat) and finds a lot.
I also built a skill I call `/meta-review` that asks Codex, Cursor, and Gemini to review the code (I use Claude Code). It always finds little things claude & I missed.
I've tried many AI code review tools. Nothing comes close to the depth of CodeRabbit reviews. It's the only such tool that can find real logical bugs. I'd love to be able to get Claude Code to do similar quality of review, but I can't get it right, no matter how I try.
Is it actually flat fee? I loved Cursor bugbot which was flat fee but they moved to per-run and that killed it for me, but a lot of others are doing the same.
- very good recall (~74%, e.g. found a lot of the golden issues)
- not so good precision (~12%, e.g. lots of false positives)
- the precision causes the F1 to tank (~20%, if this stays the same on the full 50 sample it would puts it almost last, even less than Kilo+Grok)
uses a node image installs claude code runs a /review-like command puts inline comments to PR deletes old comments when rerunning
OCR seems cool, but overkill, and I'm definitely not using Code Rabbit after their CEO was on here acting snobbish a while back.
Point being AI code review in Git** itself isn't hard to do and can add a lot of value quickly.
It's very easy to build a basic code review tool. It's hard to build one that developers won't ask you to turn off because of false positives (or one that will miss your next escaped bug)
I think if all the tool does is run a claude code level /review skill (which all developers should definitely run before they even open a PR) then isn't this a bit of a review theater? Just a guardrail to those developers who don't run a /review-triage-fix skill in /loop before they take the PR out of draft?
I wonder how many PRs in the world got to production where several developers commented on each other's code, and none of them read anything, just used their gh cli / MCP to post / answer comments / fix issues on their behalf.
There is going to be an exponential growth of code generated, and you can't escape AI code review, but also there is no real difference between having Claude Code write the code and review itself locally, vs communicating with itself via a slow and downtime prone medium of "PR comments"
tl;dr - without any human in the loop reviewing the AI code review, or skimming to see what the AI code review missed, there is no real reason to use a "code review" you can just run it as part of the CI/CD and hope AI won't miss anything (according to my linkedin feed, there are people out there who really thing this way...)
You are likely to get better results if you do not use the same model for review that wrote the code. I typically use Opus for code editing and GPT 5.5 for peer review using an automation with skills.
Training set is different between models. If there are gaps in coverage in one model, you want a different model reviewing the work. The second model will its own gaps, but the gap list is not identical.
And if you put the review effort into polishing an impl plan, then it doesn't matter which model implements it either.
The idea of a PR is for others to find things that you have a blind spot to, and also leave some paper trail on the thought process. E.g. if something was not fixed, there is a history of a comment and a reason on WHY it wasn't fixed. If you do all that only locally, that context is lost.
We noticed that even after doing this self review loop multiple times, we still find issues (either via other models / tools or via humans that have the "tribal knowledge")
Maybe one day AI will write perfect code and can review itself, but even if it's 0.1% chance it has a bug, or 1 in a million it will do something a bit sinister (like open a backdoor just in case you try to shut it down) - then I really think there is always going to be a need for humans to review something.
They do open source a fair bit of internal tooling, so it’s always interesting to see their approach
Wish they chose a different acronym...
We have our own internal automated review which has shown positive results, but I would love to drop it if I find something better.
Code review is currently our bottleneck, so any possibility of better automating it is welcome.
https://codereview.withmartian.com
"The agent can read full file contents, search the codebase, inspect other changed files for context, and produce deep reviews — not just surface-level diff feedback." our tool does all this too. It catches dumb typos as well as more complicated bugs. Not to mention it is great as a ratchet (https://qntm.org/ratchet). It is not a substitute for reviews from other engineers though, since obviously it does nothing to achieve one of the main goals of code review, which is to socialize knowledge of the codebase.
Alibaba's work here is almost certainly more advanced than what we've done, but ours has been perfectly satisfactory and better than the paid offerings we've tried. I think most teams should not be paying SaaS fees for AI code review, that is the kind of business that mostly should not exist any more.
I also built a skill I call `/meta-review` that asks Codex, Cursor, and Gemini to review the code (I use Claude Code). It always finds little things claude & I missed.
Coderabbit just came out with their own PR review UI that's great for big PRs, it groups files together etc. https://www.coderabbit.ai/blog/introducing-atlas-the-first-a...
How do you see CodeRabbit against other AI code review solutions? E.g. cubic.dev, Qodo, Graphite, Greptile, Baz, Augment Code...
An alternative UI to GitHub is well overdue. But once someone will get it right, everyone will copy them...
Yea I liked bugbot too but it became pretty pricey.