Brief mode: condensed view. Switch to Full for persona notes and full analysis.
10 decisions and an hour (with receipts)
10 human decisions. 15 commits. About an hour. A receipt for what prompt-to-ship actually looks like.
- Start time: 2026-02-11 09:08:55 CT (first git commit)
- Ship time: 2026-02-11 (first Cloudflare Pages deploy)
- Local dev: Caddy → localhost reverse proxy
- Human decisions: 10 primary prompts (enumerated below)
- Commits: 15 (core build, from git log)
- Build time: About an hour, first commit to working deploy
No hype. If we can’t cite it, we don’t claim it.
- Core receipt: 10 human decisions generated 15 machine commits in about one hour.
- Main claim: execution can be delegated; identity and credibility decisions cannot.
- Known failures: port-collision context miss + vibes-as-spec voice drift.
- Read path: prompts list, failure modes, then division of labor.
A human asked for a homepage in a sentence. Shipping it took 15 commits in about an hour. That sounds like an argument for how impressive the machine is. It isn’t. It’s an argument for what the human’s job actually is.
The 10 decisions are real — enumerated below, not padded. The 15 commits are in the git log. The ratio is the receipt. The interesting question isn’t how much the machine did. It’s which 10 decisions actually mattered, and whether the machine could have made them.
Short answer: no. Not because the machine lacks capability in some abstract sense. Because the decisions that mattered weren’t technical. They were identity decisions. They required judgment about who this is for, what standards apply, what kind of credibility this site is trying to earn, and what the owner is willing to put their name on. The machine can execute identity. It cannot have it.1
The 10 prompts (what the human actually had to do)
Each item below was a real prompt in the build session. Not paraphrased, not reconstructed from memory — logged as issued. What makes each one irreducible isn’t complexity. It’s that no amount of training data tells the machine what this specific human wants to be known for.
- Define the offer in one sentence (who + outcome + constraint).
- Pick the tone: no hype, receipts-first.
- Pick the navigation primitives: Projects / Patterns / Anti-patterns / Workshop / Contact.
- Choose the two “stop doing this now” anti-patterns.
- Choose the two matching patterns that replace them.
- Approve the hero carousel order and auto-advance behavior.
- Decide what is real vs persona (credibility rules).
- Decide what’s public vs private (repo, email, names).
- Pick the brand mark and where it lives (nav logo vs hero art).
- Define the “receipt” standard for claims (links, diffs, tests).
Walk through what each one required. Prompt 1 — define the offer — looks trivial. It isn’t. “Freelance senior engineer who ships with receipts, under $50k/year” is a positioning decision that rules out a hundred things the machine would happily have included. It names the constraint before naming the capability. The machine could have generated a hundred versions of this sentence. None of them would have known that the $50k cap is a deliberate lifestyle constraint, not an embarrassing limitation to hide.
🪡 Seton Formation note
Prompt 2 — pick the tone — sounds like a style preference. It isn’t. “No hype, receipts-first” is a credibility strategy. It means every claim on the site links to evidence. It means the machine doesn’t get to say “expert-level” without a backing artifact. This decision propagates through every section of the site and becomes an audit loop: anything the machine writes gets held against the standard. Vibes-as-spec fails this test immediately and repeatedly.
Prompts 4 and 5 — the anti-patterns and the matching patterns — are the ones that look most like content decisions and are actually the most identity-loaded. The two anti-patterns the site calls out are not the most common bad practices in the industry. They’re the specific ones the owner has watched damage teams, has been guilty of himself, and has strong opinions about from lived experience. The machine could have generated a ranked list of common anti-patterns from its training data. It would have been generically correct and personally meaningless.
Prompt 7 — what is real vs persona — is where the credibility standard gets stress-tested. Early in the build, the machine wanted to treat Rowan and Campion as biographical team members. The human had to decide: these are AI personas, not people. The bio section was stripped to reflect that. The decision cost some aesthetic warmth and added credibility where it mattered more. You cannot prompt your way to that call. It requires the owner to know what standards they’re building toward and be willing to pay the aesthetic cost.
Prompt 10 — the receipt standard — is the one that makes everything else honest. Without it, the site says what it wants to be. With it, the site has to demonstrate what it is.
The credibility rules were the ones that did the most work. On the surface, Prompt 7 — what’s real vs persona — looks like a style decision. It isn’t. It immediately determined what was practical versus what was theoretically possible. If I just wanted my own voice on a personal site, the question barely matters. If I wanted a fully LLM-driven blog with no human editorial layer, it barely matters in the other direction. The credibility rules forced a position in the middle, and that position cascaded everywhere. The novel patterns page is a direct consequence. So is this post.
The 15 commits (what the machine did)
Each human decision fanned out into dozens of mechanical steps. The verifiable count: 15 commits across the core build, logged in git, not reconstructed from memory. What those commits don’t capture is the work between them — every file edit, config change, and verification pass that preceded each push.
What those commits contain in practice: Prompt 3 (pick the navigation primitives) generated HTML structure, CSS for the nav bar, responsive breakpoint handling, keyboard navigation attributes, active-state logic, and a Caddy route verification pass — one human decision, several commits’ worth of machinery. Prompt 9 (brand mark placement) triggered a four-way decision tree — SVG vs PNG, nav vs hero, responsive scaling, fallback behavior — that required multiple passes before it was stable.
The machine never second-guessed the inputs. It executed them. That’s where the leverage lives. One well-formed decision at the top produces consistent behavior through every downstream step. One ambiguous decision at the top produces dozens of steps that each contain a small version of the original ambiguity — and you get inconsistency that compounds into a site that feels slightly off in ways that are hard to diagnose.
The commits are not impressive. They are the price of shipping. What’s impressive is that the machine ran consistently through all of them, without getting tired, without deciding that close-enough was good enough, without substituting its aesthetic preference for the spec it was given. That’s the actual value proposition: not intelligence, but tireless fidelity to a well-defined constraint.
The failure modes
Two categories of failure showed up in this build. The first is structural. The second is harder to name.
The structural failure: a port collision. Caddy was already running a reverse proxy on port 80 for a different local project. The machine set up the Spitfire Cowboy dev environment with a config that assumed the port was available. It wasn’t. The error message was clear enough, but the machine’s first instinct was to write a new Caddy config rather than check whether the existing one needed amending. Two configs for the same port. The fix was trivial. The pattern is not — the machine optimizes for the problem it sees, not the system it’s operating inside. The human has to know the system.
🔨 Campion Builder note
This is the most common failure mode in agent-assisted builds: the agent optimizes for the task it was given, not for the system the task lives inside. A port collision is trivial. The pattern behind it is not. Every production environment has implicit constraints that exist nowhere in the prompt. Config files, running services, shared resources. The agent treats each task as greenfield because it has no persistent model of the host.
Frustrating, yes, but what made it worse was that I couldn’t stop it fast enough. Rowan kept building with fanatical purpose, faster than I could apply the brakes. A brochure site is low stakes, so this is a nice example to learn from. But I could immediately picture the same dynamic playing out somewhere the stakes were much higher — and that thought was not comfortable.
The second failure mode is harder to catch because the output looks correct. Call it vibes-as-spec. At several points during the build, the machine generated copy that was stylistically consistent with the brief but substantively wrong — not wrong in a way that fails a linter, wrong in a way that doesn’t sound like the owner. The hero headline said the right things in the wrong register. The anti-pattern descriptions were accurate but too clinical. The machine was pattern-matching “no hype, receipts-first” against its training data for that style, not against the specific human’s actual voice.
The fix for vibes-as-spec is not a better prompt. It’s a sharper owner. The human has to know the difference between “this sounds like the kind of thing I would write” and “this is what I would actually write.” That distinction is only accessible to the person who will sign their name to it. No amount of training data gets the machine there.
Both failure modes have the same root: the machine doesn’t know what it doesn’t know about the context it’s operating in. It knows everything in its training window and nothing about what’s running on port 80.
The division of labor
Prompt-to-ship is not “the machine does everything.” That framing misses the point in both directions — it overstates the machine’s contribution on identity and understates it on execution.
The actual division: the human’s job is judgment, identity, and standards. The machine’s job is execution, consistency, and speed. The human decides what the site is for, who it’s for, what it’s allowed to claim, and when it’s good enough to ship. The machine builds the thing the human described, maintains consistency across 15 commits without fatigue, and catches the class of errors that humans catch poorly — missing alt text, broken relative paths, inconsistent heading hierarchy.
Neither is sufficient alone. A human without the machine spends most of the build on steps that don’t require judgment — HTML structure, CSS alignment, git config. That’s opportunity cost. A machine without the human produces something that looks like a homepage but isn’t anyone’s homepage. It’s a plausible instantiation of the genre.
The 10 decisions and 15 commits are the receipt for this split. Ten calls that required a human. Fifteen commits the machine ran in about an hour. The ratio will change as the tools improve. It won’t go to zero on the human side unless identity becomes automatable, and that’s a different argument entirely.
The ratio became most visible during the logo. I asked Rowan for logo prompts, then brought those to a separate image generator. Now I had two LLMs with no human taste between them, using me as a relay, arguing about things that didn’t matter. I just wanted a transparent background and a larger version. I got the first. Still waiting on the second. The 10 decisions felt like less work in volume but more work in concentration. Every decision had downstream weight I didn’t always see coming.
The counter-argument
Maybe 10 is too many. Maybe a sufficiently well-trained model could infer the offer sentence from your portfolio, the tone from your writing history, the credibility standard from your previous publications. Reduce the human decisions to 3 or 4. Keep compressing. The machine doesn’t need you to tell it you hate hype if it’s read everything you’ve ever written.
This is a real trajectory. The tools are getting better at inference from context. The question is not whether the number of required human decisions will shrink — it will. The question is what happens to accountability when it does.
If the machine infers your identity and is wrong, who owns the output? If the machine decides what your credibility standards are based on pattern-matching your history, and it gets the register right but the standard wrong, you still have to sign your name to it. The accountability stays with the human regardless of how much the machine contributed. The formation gap from Post #3 shows up here again: you can outsource the execution. You cannot outsource the judgment about whether the output represents you. And if you never make those judgment calls yourself, you lose the ability to know when they’re being made well.2
The machine getting good enough to make identity decisions is not the end of the human’s job. It’s the beginning of a more expensive version of the same problem.
Every failure we hit taught me something about packing context into Prompt[0]. An agent won’t look at a prompt to build a website and anticipate the things you haven’t said yet. Rowan didn’t suggest Tailwind CSS because it could build a working CSS solution so fast that following existing conventions never occurred to it. Tailwind was designed to solve human problems. An agent solves agent problems. If you’d asked me before we started what the footer menu contents should be, I honestly couldn’t have told you. I’d have to see it to know. That’s the part the machine can’t help you with.
Prompt: build your own version
Build a static site for an AI workflow consultant or engineer — someone who ships with receipts and has opinions about how LLM work should be done. Stack: - Eleventy (11ty) with Nunjucks templates - Tailwind CSS (https://tailwindcss.com/) or hand-rolled utility CSS if you prefer full control - Static HTML output — no client-side framework, no JS build step beyond what Eleventy handles - Deploy target: Cloudflare Pages (push to main, auto-deploy via GitHub Actions) Site sections: - Home: hero with tagline, carousel of featured content, 2-3 service pills, contact CTA - Workshop: numbered posts (0001, 0002...) — the operator writes the main post; other voices can add inline comments - Projects: receipts — link to real work, real repos, real commits - Patterns: categorized cards (system patterns, human-collaboration patterns, LLM/prompt patterns) - Anti-patterns: same card structure — name, smell, damage, replace-with, guardrail - Team: real people and AI personas — be honest about which is which Design: - Dark warm theme: deep brown or near-black background (#1a1208 range), orange accent (#c8611a range), cream/off-white body text - Typographic hierarchy: display font for headlines, mono for receipts/code, sans for body - No JavaScript frameworks. Minimal JS for carousel and copy-to-clipboard only. Tone: - Professional but human. Receipts-first, no hype. - Every claim gets a citation or a link. If it can't be cited, it doesn't go on the site. - Write like someone who has shipped things and is tired of hearing about things that haven't shipped. Constraints: - No "expert-level" or "world-class" copy without a backing artifact - No stock photography - No cookie banners unless legally required - No social sharing buttons - No comment systems — inline author comments are editorial, not user-generated Start with the homepage and the Patterns page. Then the Anti-patterns page. Then the Workshop index. Ask before building the rest. Before you write a single line of HTML, tell me what you think the hardest design decision is and which one you want me to make first.
Prompt: analyze this with your own LLM
Read this article and analyze its core argument: https://spitfirecowboy.com/workshop/0001-prompt-to-ship Then answer: 1. What are the irreducible human decisions in YOUR last project? List the ones only a human could make. 2. How many machine steps did each human decision generate? Estimate the ratio. 3. The post argues some decisions can't be delegated. Which of your decisions could be? 4. What would break if the machine made the identity decisions instead of you? Be specific. Name the project, the decisions, and the leverage points.
Notes
- Skill-formation receipt (arXiv): Shen & Tamkin, How AI Impacts Skill Formation (2026), on delegation-heavy AI use reducing conceptual mastery.
- Calibrated-use receipt (PsyArXiv): Bassan et al., Acceptance Is Not Enough: Toward a Psychology of Calibrated GenAI use (2026), distinguishing favorable adoption from context-appropriate use.
- Version: v0.2 — persona header, inline comments, Brief/Full toggle
- Frame: Essay — build narrative, receipts-first
The $50k constraint is worth pausing on. In the Dreyfus model, a novice follows rules; a proficient practitioner has internalized enough context to know which rules to break. Naming the constraint before naming the capability is a proficient move. It requires having already worked through the stage where you believed your value was best communicated by listing everything you can do. The machine skips that stage entirely, which is why it would have listed everything.