INDUSTRY · JUNE 4, 2026 · 10 MIN READ
The bottleneck moved: from generating code faster to validating it better
The cost of writing code fell to near zero. The cost of trusting it did not. That gap, not a vendor fight, is the real story behind the AI code review debate.
For thirty years the scarce resource in software was the act of writing it. Tools, languages, and abstractions all aimed at the same target: let a person express more behavior in less time. AI assistants are the latest and largest move in that direction, and they worked. The cost of producing a plausible diff has fallen to near zero.
The cost of trusting that diff has not moved at all.
Two costs that used to travel together#
Writing code and vouching for code were once the same activity, performed by the same person, at roughly the same speed. An engineer who wrote a function understood it well enough to defend it. Generation severed that link. A diff can now arrive without anyone having reasoned about it, which means the work of deciding whether it is correct is a separate, growing task that lands on someone else.
That someone else is a reviewer, and reviewer throughput is flat. A person reads diffs at human speed regardless of how the diffs were produced. So the pipeline now has a fast front and a slow back, and the queue between them grows in proportion to how well generation works.
| Era | Cheap | Expensive | Where time goes |
|---|---|---|---|
| Pre-assistant | Nothing | Writing code | Authoring |
| Assistant | Writing code | Trusting code | Review queue |
The "bubble" framing misses the shift#
A popular argument holds that AI code review is a bubble — too many tools chasing a problem that better generation will dissolve. The reasoning goes: if models write good enough code, review stops mattering.
That gets the direction backward. Better generation does not shrink the review problem; it enlarges it. More acceptable diffs per day means more diffs to vouch for per day. The volume that makes generation look successful is the same volume that floods the reviewer. Treating this as a fight between code-review vendors mistakes a structural change for a product cycle.
The constraint on shipping software moved downstream, from authoring to validating. Tools that only generate faster push harder against a wall they helped build.
Validation is the part that has to scale#
If the bottleneck is trust, the question worth answering is how trust scales with volume. Human review does not — it is a fixed-rate resource. So the work is to make a machine produce the one thing review actually consumes: evidence.
Not a second opinion. Not a confidence score. Evidence — a reconstructed condition, a behavioral check, a record of what was verified and how. A reviewer handed evidence spends time on judgment instead of archaeology. A reviewer handed another unexplained diff spends time doing the generation tool's unfinished work.
This is the line Hyrax is built on. The point is not to write more code than a person. It is to validate a change well enough that the evidence travels with it into the pull request, so a reviewer's flat throughput is spent on the decisions only a human should make.
What this means for the next few years#
Teams will not be differentiated by how much code their tools can produce. That number is already effectively unlimited and trending toward free. They will be differentiated by how much produced code they can trust per unit of human attention — by whether their validation scales with their generation or falls behind it.
The cheap thing got cheaper. The expensive thing stayed expensive. The work is on the expensive side now.