CODE HEALTH · APRIL 30, 2026 · 8 MIN READ

AI slop: what it is, what it costs, and how to see it in your repo

Slop is low-effort AI-generated code that looks plausible and compiles. The cost is downstream and measurable. Three signals make it visible on your repo today.

AI slop is the term that emerged in 2026 to describe the low-effort, mass-produced code that AI coding assistants generate when nobody is paying attention. The word borrows from the broader internet sense of slop, where AI output floods a medium with content that looks legitimate on the surface and degrades the medium underneath. In software, slop is code that compiles, passes basic tests, and looks reasonable in a PR review. The cost arrives later, when the same function exists in three places, the convention drifts away from the team's established patterns, and a fix has to be applied in places nobody remembered existed.

The first studies measuring it are now public. GitClear's 2026 AI Assistant Code Quality Research, the most-cited dataset of the year, found duplicated code blocks rising roughly eightfold since AI tools mainstreamed. Copy-paste rates climbed from 8.3% to 18% across the same period. Code churn doubled from 3% to 7.9%, meaning roughly a doubling in the rate at which committed code is rewritten or deleted within two weeks of landing. Heavy AI users produce 4 to 10 times more durable code than non-users and 9 times more churn. The pattern is consistent: the code arrives faster, and a meaningful fraction of it does not survive.

Cursor's own May 28 Habits Report confirmed the volume side. Lines added per developer per week rose from 3,600 in January 2025 to 8,600 in May 2026, a 2.4x increase. Mega-PRs over 1,000 lines climbed from 8% to 13.8% of all PRs. The fraction of agent-generated changes reaching commits without a separate manual diff acceptance step rose from 7% to 36.3%. Generation rates and acceptance velocity are both compounding.

Most teams cannot see this. The metrics they track were designed for human-written code at human pace. PRs ship, tests pass, the deploy pipeline turns green. The slop accumulates underneath and shows up as a slow rise in incident rate, a creeping increase in the time it takes to make a small change, and a steady drag on the team's sense that the code base is in a known state.

What slop actually looks like in a file#

The most common patterns documented in the past year of code reviews and incident postmortems:

Duplicated utility functions. The agent reads the file it is editing. It does not search the whole codebase for an existing helper. When the project already has formatDate in src/utils/format.ts, the agent writes a second formatDate in whatever file it happens to be editing. Both functions now exist, both work, both pass tests. The duplication is invisible until somebody fixes a bug in one and the other keeps shipping the unfixed behavior. CodeRabbit's analysis of 470 PRs published in the 2026 State of AI Code Review found duplicate-helper patterns in 17% of agent-authored PRs that landed without revision.

Vibe-coded features that pass tests then fail at the edge. Karpathy's framing of vibe coding has stuck because it names a behavior the industry was already doing. The developer prompts an outcome. The agent writes the code. The developer runs the tests. The tests pass. The developer ships. The first part of that loop where the developer reads and understands the code has been quietly removed. The Stack Overflow 2026 Developer Survey reports 46% of daily AI users describe at least one production incident in the past 6 months traceable to code they did not closely read before merging.

Convention drift. Models default to the most-common-on-GitHub patterns from their training distribution. Your team's actual conventions diverge from that distribution in small ways: how errors are wrapped, how config is loaded, how async cleanup is handled. The AI writes against the public average. Your reviewer might catch the drift on the first PR. By the tenth PR with the same drift, the reviewer is exhausted and the codebase has two conventions running in parallel.

Comments that document the wrong thing. The agent generates code and the agent generates the comment about the code. Both are plausible. The comment describes what the agent intended. The code does something slightly different. Without an independent review, the comment becomes the wrong source of truth for the next person reading the file.

What slop costs#

The aggregate cost is the headline number. Entelligence's 2026 cost study across 2,444 companies found that for every dollar a team spends on AI coding tokens, 44 cents goes to fixing bugs the AI introduced and 27 cents to rewriting AI output. 82 cents of every dollar of AI coding spend is downstream cleanup work.

The local cost is harder to measure but easier to feel. Incident rate per engineer per quarter is the cleanest signal if your team tracks it. Time-to-make-a-small-change is the second. Senior engineers report spending more of their week on review and refactor and less of their week on design and net-new work. That last shift does not show up in dashboards until the senior engineer leaves. By the time it does, the codebase has been mediated through a tool that was not asked whether the result was something the team would want to maintain.

The three signals that make slop visible#

You can measure these starting this week. None of them require a new vendor.

Duplication delta per PR. Run jscpd (or any duplication detector for your language) on the PR branch and on main. Subtract. Fail any PR whose delta exceeds a small threshold. The first few PRs after enabling this will trip the alarm. Each one is a place where the agent wrote a duplicate of code that already existed. Fix those, then the alarm goes silent. When it starts ringing again later, you know slop is being added.

Code churn by file. Add a job that computes, for each file changed in the last 30 days, the ratio of lines added then deleted within 14 days. Sort. The top of the list is the slop bucket. The pattern is consistent: high-churn files are AI-edited files with insufficient review. Use the list to decide where to slow down and apply human review.

Function-name collision warnings. Pre-merge check that scans the changed files for new function declarations and greps the rest of the codebase for matching names. Flag every collision. Not every collision is a duplicate; some are legitimate per-module helpers. But every collision is a question that a reviewer should answer before the PR merges.

The three together cost about an hour of CI setup and three lines of YAML each. They produce a leading indicator of slop accumulation that DORA metrics and conventional CI miss.

What to do this week#

Add the three signals to your pipeline. Watch the duplication delta and churn-by-file numbers for one full sprint. After that sprint, you will have a measured view of where slop is accumulating in your repo. Bring it to your next architecture review. Decide where to intervene first based on what the data shows, not on what felt slow.

The teams I have talked to that have done this for two sprints all reported the same pattern: the worst slop concentrates in a small number of files, written by a small number of developers using AI without strong review discipline. Naming the files and showing the numbers in a sprint review changes the behavior more reliably than any policy memo about AI use.

Slop is a measurable phenomenon. Treating it as a measurable phenomenon is the first step in containing it.