Brief mode: condensed view. Switch to Full for persona notes and full analysis.
The Struggle Is the Point
LLM coding agents produce proficient-looking output. The formation gap underneath is invisible — and the Dreyfus model didn't account for it.
There is a standard model of how people develop expertise. Stuart and Hubert Dreyfus articulated it in the 1980s: novice, advanced beginner, competent, proficient, expert. The progression is not linear in output — it is linear in internal structure. The competent programmer is not just faster than the novice. They have built a different kind of mind. They have internalized patterns through accumulated failure, through debugging at 11pm, through writing the same bad code three times and finally seeing why it was bad.
The model has one quiet assumption baked in that nobody talks about: the learner does the formative work themselves.
LLM coding agents have broken that assumption. Not gradually. Immediately.
A junior developer who starts today with GitHub Copilot or Claude Code available from hour one is not moving through the Dreyfus stages at speed. They are producing proficient-looking output without building the cognitive substrate that proficiency is supposed to represent. The output looks right. The understanding is not there. This is not speculation — it is the structural consequence of offloading the struggle.2
Here is the problem that makes this worse: the tools are good enough that the junior developer cannot tell the difference. The code compiles. The tests pass. The PR gets merged. The formation gap is invisible until it is not — until there is a production incident with a novel failure mode, until someone asks them to explain a system they built, until the LLM confidently produces subtly wrong code and they lack the judgment to catch it.3
And it is not just juniors. A senior developer using Claude Code to architect a system they have never built before is in the same risk position. The Dreyfus stages reset every time you enter unfamiliar territory — new language, new domain, new architecture pattern. If the tool short-circuits that reset, the human never forms judgment for that domain. The formation gap follows every new frontier, not just the career start.
🪡 Seton Formation note
This pattern also differs practically depending on team size. A 50-person eng org has surface area to absorb formation gaps — incident response, institutional memory, someone who remembers why the thing was built. A solo dev has none of that. The number of humans changes the math.
The solo practitioner is most exposed. No peer review. No colleague asking “why did you do it that way?” The feedback loop is entirely internal. You have to notice your own judgment gap, which is precisely the thing a judgment gap prevents you from doing. The most vulnerable are the least likely to know it.
OpenAI and Anthropic have no structural answer for this. Their incentive runs the opposite direction. You do not close a massive funding round by making your AI worse on purpose.
Here's why I might be wrong
The strongest counterargument is the one the tools themselves could implement: a debug mode. Not “give me the answer” — “ask me guiding questions, withhold the solution, make me work toward it.” A Socratic LLM. Hints only after three failed attempts. The tools could be configured for formation, not just productivity. The pipeline is not necessarily broken if someone builds a learning mode alongside the default get-it-done mode.
“Mandatory debug mode for juniors” is paternalistic bordering on psychopathy. The debug mode concept applies to me too. Once you become a senior dev, you don’t get to coast. Formation is continuous or it isn’t formation.
The reason this does not exist yet is instructive: no vendor has an incentive to build it. Copilot demos are about speed. Claude Code demos are about autonomy. “Watch our tool make you struggle harder” is not a sales pitch.
There is also the calculator panic precedent. People worried calculators would kill mathematical reasoning. They were wrong. Kids who grew up with calculators turned out capable of real math. Maybe the formation pipeline adapts. Maybe “knowing what to ask the LLM” is itself a real skill, and the new Dreyfus hierarchy runs from novice prompt writer to expert, and that is sufficient.
I am not persuaded. The calculator analogy fails because calculators did not write your math homework. LLMs do. “Knowing what to ask” is a real skill — but it is a senior-level skill. It does not explain where the juniors who can judge the answers come from in the first place.1
🔨 Campion Builder note
The calculator analogy also fails at the failure-mode level. A wrong answer from a calculator is obviously wrong: the number is off by orders of magnitude and you notice. A wrong answer from an LLM compiles, passes tests, and reads like idiomatic code. The feedback signal that tells you something went wrong is structurally absent. In production, that means the failure surfaces as a bug report three months later, not as a red squiggle on the screen.
The alternative to being wrong is that we are systematically hollowing out the pipeline that produces the engineers who will eventually be responsible for the systems these tools run on. If junior developers never build judgment, and seniors skip formation on every new frontier, and solo practitioners have no mechanism to catch the gap — what does the next generation of senior engineers look like?
Tell me what I'm missing.
Receipts, not vibes. Every claim in this post links to its source material. You argue a thesis, the reader gets the raw inputs. That’s the deal.
Prompt: analyze this with your own LLM
Read this article and analyze its core argument: https://spitfirecowboy.com/workshop/0003-dreyfus-collapse Then answer: 1. Where does the Dreyfus Collapse thesis apply to YOUR current workflow? 2. What formation gaps might exist in domains where you rely on LLM output without independent judgment? 3. The post argues the calculator analogy fails. Do you agree? What's different about LLMs vs calculators? 4. What would a "debug mode" look like for your specific tools and domain? Be specific. Name the tools, the domains, and the gaps.
Notes
- Before writing this post, the thesis was run through a second LLM family as a devil’s advocate panel. The models immediately jumped to solutions: friction layers, dashboards, retrospective templates. Textbook Dreyfus Stage 2–3: pattern-match the problem, apply a framework, produce competent-looking output. A genuine expert would have questioned the framing. The LLMs’ response to the thesis demonstrated the thesis.
- Skill-formation receipt (arXiv): Shen & Tamkin, How AI Impacts Skill Formation (2026), reporting lower conceptual mastery under delegation-heavy AI use patterns.
- Calibration receipt (PsyArXiv): Bassan et al., Acceptance Is Not Enough: Toward a Psychology of Calibrated GenAI use (2026), distinguishing adoption from context-appropriate use quality.
- Version: v0.2 — persona header, inline comments, source links, avatars
- Frame: Devil’s advocate — thesis stated, then challenged
This is the crux the post almost names but doesn't quite land. In classical formation theory, the struggle is not an obstacle to competence; it is the mechanism by which competence becomes habitus. Virtue ethics makes the same claim: you do not acquire prudence by having someone else make prudent decisions for you. You acquire it by making bad ones, feeling the cost, and internalizing the correction. An LLM that removes the cost removes the formation pathway, not just the inconvenience.