By AJ · March 2026

The code is solved.
The product isn't.

Last week I shipped a cross-platform family calendar app. TypeScript API running on Cloudflare Workers. React web app with TanStack Query. Native SwiftUI iPad app with types auto-generated from an OpenAPI spec. A landing page. Two blog posts. An email intake pipeline that reads school newsletters and creates calendar events automatically.

It took seven days. And I don't remember opening a code editor.

I've been writing software for fifteen years. Nothing in my experience prepared me for how that felt.

What "vibe coding" actually looks like

The term gets used loosely. People mean different things by it. What I did was closer to architecture and direction: I wrote the plan, defined the types, made the structural decisions, described what I wanted. Claude wrote the implementation. Thousands of lines of it, across four codebases, consistently following the patterns I'd established.

When I wanted drag-to-reschedule events in the calendar week view, I described the interaction and it appeared — complete, handling edge cases I hadn't thought to mention. When I needed DOCX attachment parsing in a Cloudflare Worker, it reached for the right library, wired it in correctly, and handled the binary encoding. When the iOS auth layer needed refactoring to use the Supabase Swift SDK instead of manual JWT handling, that was a few minutes of directed conversation.

The git history has 57 commits in seven days. Most of them include the line Co-Authored-By: Claude Sonnet 4.6.

This part — writing code — is genuinely solved. Not "good enough to be useful." Solved. If you can describe what you want clearly, you can have it built. The bottleneck has moved.

Setting up the collaboration

Before we started, I asked one question that I think most people miss: not "what's the best stack for this?" but "which libraries are most present in your training data?"

That's a different question. It's not asking for the theoretically optimal choice — it's asking what the AI can work with most confidently, where the risk of subtle hallucination is lowest, where the patterns are deepest. The answer shaped everything: Hono for the API layer, Supabase for data and auth, TanStack Query on the web, Zod for schemas, shadcn/ui for components. Not necessarily what I'd have chosen alone. What we could work with together.

The second thing I did was write the collaboration contract: a CLAUDE.md file that lives in the project root and gets read at the start of every session. It has architecture rules — every backend operation follows a strict Route → Service → Provider layering, no exceptions. It has type rules — Zod schemas are defined first and generate TypeScript types, the OpenAPI spec, and Swift client types all at once. And it has this at the top:

Boring is good. Boilerplate is good. Don't be clever.

That line is written specifically to fight the AI's natural tendencies. Left to its own devices, it will reach for abstractions, eliminate repetition, find the elegant solution. That's often exactly wrong in a codebase you need to read and maintain. The rules aren't just good engineering practice — they're constraints on a collaborator that will otherwise drift toward cleverness across dozens of sessions. Writing them was a form of product thinking applied to the development process itself.

The types-first rule had a particularly important effect: because one Zod schema generates types for TypeScript, the API spec, and the Swift client simultaneously, four codebases stayed in sync without any manual coordination. Every time a field changed, everything changed together. That's not magic — it's a deliberate architectural decision that only works if you commit to it from day one.

What I had to bring to the collaboration

This isn't a story about AI doing the easy parts while I did the hard parts. It's more honest than that. We were collaborating throughout — working through tradeoffs together, building on each other's thinking. The difference was in what I had to bring to that conversation. There were things I had to know before the conversation could go anywhere useful.

The most important sentence in this entire project is buried in a planning document. It reads: "The problem isn't displaying calendars — families already have calendars. The problem is getting information INTO the calendar."

That insight came from sitting with the problem — from noticing that the families I was thinking about already had Google Calendar, already had Apple Calendar, and still somehow had a fridge covered in school letters and a parent lying awake at 11pm trying to remember which day the photographer was coming. Once I brought that insight to the conversation, everything clicked into place. But I had to bring it. It was the seed the whole collaboration grew from.

That insight restructured the entire product. The display — the beautiful wall-mounted iPad calendar — is just the payoff. The intake pipeline, the mechanism for getting information in from emails and photos and WhatsApp messages, is the actual thing.

Five decisions that were mine to make

Let me be specific about what I had to bring to the table.

1. Knowing what NOT to build

There's a commit titled "Update PLAN.md: skip web display mode, iPad-only." One line. No code changed. The decision: the web app would handle event management, but the wall display was iPad-only. A browser tab defeats the entire point — which is a dedicated screen that's always on, always visible, that nobody has to unlock or navigate to. We talked through the tradeoffs. But the judgment call — that this was wrong for the product — required understanding why someone would want this thing at all. That understanding was mine to bring.

2. What parents actually need from a school email

The AI extraction prompt includes this instruction: "They do NOT need term start/end markers — they know when school is on. They need to know when it's NOT on, and anything out of the ordinary." That sentence determines which events get extracted and which get filtered. It's the difference between a calendar full of noise and one that actually helps. You don't arrive at it by studying calendar apps. You get there by understanding what a tired parent needs to see when they glance at a screen on a Monday morning.

3. Testing with real school emails

Before shipping the intake pipeline, I built an eval harness and ran it against actual emails: "Blue Kite TD day.eml", "Red Nose day - Comic Relief - Friday 20th March.eml", "EYFS term 5 information.eml". Then I checked the output. Did it catch the inset day? Did it correctly span the Easter holiday as one event rather than seven separate days? Did it skip the term dates? That judgment — this one worked, this one missed something important — requires caring about whether the thing actually works. The AI ran the extraction. The caring was mine.

4. Absorbing the AI costs

The plan has a step marked "Not yet built": letting families supply their own Anthropic API key to keep costs near zero. I decided not to build it. Not because it's hard — it's a settings page and a key lookup. Because making families manage an API key creates friction at exactly the wrong moment. The product should just work. That's a business model decision, a product philosophy, a bet about what customers will pay for. We could discuss the tradeoffs in detail. But the conviction about what this product is — what kind of thing it should feel like to use — that had to come from me.

5. Writing from someone else's experience

The other blog post on this site is by "Dineke." She describes lying awake at 11pm running through packed lunches and birthday party RSVPs while her partner sleeps. She doesn't exist. But the experience she's describing is real — the research she cites is real (Daminger, 2019; Ciciolla & Luthar, 2019), the numbers are real, the feeling of being the household's single point of failure is very real for a lot of people. The decision to tell the product story from inside that experience, rather than from outside it, isn't something an AI suggests. It's something you decide when you understand who you're building for and what would make them feel seen.

Taste is still yours

There's a dimension of this I haven't mentioned yet, because it's harder to point at in a commit log: how the thing looks and feels.

Claude has internalized a lot of design. Ask it to build a calendar week view and it will produce something structurally sound — correct column widths, reasonable typography, sensible colour choices. Enough to look right. But it doesn't look at the screen. It can't see that two elements are 3px off. It doesn't notice that an event block is clipping its label at a particular length, or that the gap between a section header and the first item feels slightly wrong, or that a colour reads fine in isolation but muddy against the background you chose.

With interactive behaviour, the gap is even wider. The drag-to-reschedule on the iPad calendar — you write the interaction, it works technically, and then you actually drag an event. Does it feel right? Is the threshold before the drag initiates too eager, so scrolling keeps accidentally moving events? Does the drop snap in a way that feels satisfying, or does it lag just enough to feel wrong? Does the visual feedback during the drag communicate what's happening clearly enough that you don't second-guess yourself? None of that is legible from code. You have to use it, form an opinion, and care enough to go back and adjust.

The same is true of the wall display mode — the iPad mounted on the kitchen wall, screen always on, showing the family's day. Whether that feels like something you'd actually want in your home isn't something you can evaluate from a simulator. You have to stand in front of it. The font needs to be readable from across the room. The member cards need to be distinguishable at a glance. The clock that appears when it's idle needs to feel calm, not clinical. Those are aesthetic judgments that require eyes, presence, and a specific kind of caring — does this feel right for a family's home? — that can't be delegated.

This is where the collaboration model is most honest. Claude has good taste in the abstract. But taste applied to a specific thing, in a specific context, for specific people, at a specific moment — that's still a human job. Noticing that something is almost right but not quite, and caring enough to fix it: that's the work.

The thing that changed

For most of software history, execution was expensive. A good idea still cost six months of engineering time to test. That cost shaped everything: how carefully you designed before building, how risk-averse you were with bets, how much of the product was determined by what the team could build rather than what users needed.

That cost is now close to zero. Which means the bottleneck has moved entirely to the question: is this the right thing to build?

The 57 commits were written mostly by Claude. But the planning document that says what to build and why — the document that encodes the insight about intake versus display, the decision about native iPad, the judgment about what parents need from a school email — that was written by a person. And the plan is the product.

I think we've been asking the wrong question. "Will AI replace developers?" is a question about execution. The interesting question now is: when building software costs almost nothing, what does the human contribute?

The answer, I think, is the same thing it's always been: taste, judgment, and caring about the person on the other end. The difference is that now, those things are almost the only things.

Code has always been a means to an end. What's new is that the means are nearly free. Which makes the end — the actual problem you're solving, the actual person you're solving it for — more important than it's ever been.

That's not a threat to people who care about building good products. It's the best news they've ever had.