AI IN ENGINEERING · MAY 21, 2026 · 8 MIN READ
What actually changes the week your team adopts an AI coding tool
Generation rate climbs. Review time climbs harder. Three specific patterns emerge in week one that decide whether the adoption ages well or becomes a tax.
The most-cited 2026 study on AI productivity is the METR randomized controlled trial. 16 experienced open-source developers, randomly assigned to use or not use AI tools for real tasks on familiar codebases. The result, reported in July 2025 and updated in February 2026, was that the developers using AI tools were 19% slower on aggregate. The developers using AI tools believed they were 20% faster. The 40-point perception gap between observed and felt productivity is the finding most engineering leaders should know and most do not.
The METR result is not the whole story. It does not generalize cleanly to all teams, all task types, or all codebases. What it does generalize is that the felt experience of using AI tools and the measured outcome of using AI tools diverge in predictable ways. The week your team starts using Cursor, Copilot, or Claude Code is the week that divergence opens up on your own engineering org. The dashboard does not yet reflect it. The retrospectives do not yet name it. By month three, both will. The question is whether the patterns that took hold in week one are the patterns the team wants.
This is a field note on what to watch for in that first week.
Pattern 1: The senior IC quietly turns into a full-time reviewer#
In a team without AI tools, a senior engineer spends roughly 60% of their week on building, 25% on review, 15% on meetings and architecture work. The numbers vary widely but the relative shape is consistent across the surveys.
In week one of AI adoption, generation rate roughly doubles. PRs per developer per week climb from 4-6 to 8-10 based on the Cursor Habits Report's May 2026 numbers. Review capacity does not climb. The senior IC's review queue doubles. They stay late to clear it. They start the next sprint with the queue still partially full. Their building time drops from 60% to 35% inside three weeks. Nobody decided this. It happened because the queue showed up and they were the only one with the context to clear it.
If you are the engineering lead and you want this not to happen, the intervention is to redistribute review at the moment the queue starts to climb. Sample-based review for low-risk PRs. Risk-tier the queue so the senior only sees the changes that need their context. Spread the rest across the team and accept that some lower-criticality PRs will land with less scrutiny than they used to. The alternative is the senior burns out by quarter two.
Pattern 2: The junior engineer becomes faster than they understand#
A junior engineer using an AI coding tool produces working code at the same velocity as a mid-level engineer. They have not absorbed the architectural intent of the codebase yet. The tool fills the gap by generating plausible code that fits the local syntax of the file. The tests pass. The PR ships.
The junior engineer's perceived velocity is high. Their understanding does not match the velocity. When a regression appears in code they shipped two weeks ago, they cannot diagnose it because they did not internalize the design while writing it. The Pragmatic Engineer 2026 AI Tooling survey reported that 51% of senior engineers managing junior reports flagged "their juniors do not deeply understand the code they are shipping" as a top-three concern.
The intervention is to slow the junior down deliberately. Most teams do this through revised mentorship: senior engineers review junior PRs more aggressively in week one and explain why the code should change rather than rubber-stamping the version that passes tests. Some teams require juniors to ship 30% of their work without AI assistance, on a structured weekly basis. The teams I have talked to that have done this consistently report that their juniors become genuinely faster after 3 months of mixed practice, not just nominally faster.
The intervention that does not work is forbidding juniors from using the tool. The competitive market makes that impossible to enforce.
Pattern 3: The team's mega-PR rate doubles within two weeks#
The Cursor Habits Report shows mega-PRs (over 1,000 lines) rising from 8% to 13.8% of all PRs across the corpus they measure. On a specific team in the first two weeks of adoption, the rate often doubles. The developer prompts a feature. The agent generates the full implementation. The PR opens at 1,400 lines. The reviewer scrolls.
Large PRs are reviewed less carefully than small PRs. The DORA 2026 cut found a clear negative correlation between PR size and substantive review comments per line. PRs above 500 lines receive roughly one-third the comment density per line that PRs under 200 lines do. Above 1,000 lines, the comment density collapses further. Reviewers are not being lazy; they are out of attention budget. They approve or they comment on the visible architectural shape and move on.
The intervention is to make PR size a first-class metric. Enforce a soft cap at 500 lines with a documented exception process. Train developers and the team's coding-agent use to break work into smaller PRs. The Cursor SDK and Claude Code both have features to assist with this; it is a setting and a workflow change, not a tool change.
What week one feels like vs what it actually is#
The week-one experience for most engineers using AI tools for the first time is exhilarating. They ship more. They feel faster. They report higher satisfaction. The patterns above do not feel like patterns; they feel like productivity gains.
This is the trap METR named. The perception lags the measurement. By month three, the perception starts to catch up: review queues are obviously bigger, junior PRs are obviously needing more rework, the codebase is obviously moving faster but in ways that are hard to follow. The team starts to ask whether the productivity was real. By month six, the team has either redesigned its review function or absorbed the cost as a permanent tax.
The teams that redesign do so because the engineering lead caught the patterns in weeks 1-3 and named them publicly before they hardened into culture. The teams that absorb the cost do so because nobody named the patterns and the new normal calcified.
What to do this week#
If your team has been on the tool for less than 30 days, look at the three patterns this week. For pattern 1, ask the senior IC how much of their week last week was on review. If it climbed by more than 5 percentage points from before adoption, intervene now. For pattern 2, ask the senior who manages each junior whether they have caught the junior shipping code they could not explain in the past 7 days. For pattern 3, pull last week's PRs and look at the size distribution. If the median PR size grew by more than 30%, start the conversation about caps.
The intervention has to happen in the first 60 days. By month three, the patterns have hardened and the team treats them as how they work. By month six, the cost of unwinding is higher than the cost of the original adoption.
The tool is not the problem. The tool is doing what it was sold to do. The question is whether your team's review function and team structure are evolving with it.
Sources
- 01METR, "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity," July 10 2025
- 02METR February 24 2026 update
- 03The Pragmatic Engineer 2026 AI Tooling Survey. 906 respondents, March 3 2026
- 04Cursor Developer Habits Report, May 28 2026
- 05DORA 2026 State of DevOps with AI-Assisted Software Development cut
- 06GitClear 2026 AI Assistant Code Quality Research
- 07"Code review patterns that survive the AI era" pieces from Pragmatic Engineer and Lethain
- 08The Pragmatic Engineer Part 2 on AI's impact on junior engineers (April-May 2026)
- 09DORA 2025 and prior reports on PR size and review depth
- 10Will Larson's writing on review at scale