DEV Community

Cover image for I Built a Skill That Writes Your Specs For You
Daniel Nwaneri
Daniel Nwaneri

Posted on • Edited on • Originally published at freecodecamp.org

I Built a Skill That Writes Your Specs For You

Julián Deangelis published a piece this week that hit 194k+ views in days. The argument: AI coding agents don't fail because the model is weak. They fail because the instructions are ambiguous.

He called it Spec Driven Development. Four steps: specify what you want, plan how to build it technically, break it into tasks, implement one task at a time. Each step reduces ambiguity. By the time the agent starts writing code, it has everything it needs — what the feature does, what the edge cases are, what the tests should verify.

The piece is right. The problem is the workflow takes discipline most sessions don't have. Under deadline pressure, the spec step disappears and you're back to prompting directly.

So I built a skill that does the spec-writing for you.


The silent decisions problem

Julián's example stuck with me:

"Add a feature to manage items from the backoffice."

The agent reads the codebase, picks a pattern, writes the feature. At first it looks fine. Then you click "add item" again and it inserts the same item twice. All the assumptions you thought were obvious were never in the prompt.

Which backoffice — internal or seller-facing? Should the operation be idempotent? Admin-only or all users? Which storage layer? Which error handling strategy?

Each one is a silent decision. The agent guesses. Some guesses are right. Some are wrong. And you don't find out which until the feature is in production behaving in ways you didn't expect.

The spec is what makes those decisions visible before the agent guesses. But writing a spec from scratch, for every feature, before every session — that's the friction that kills the habit.


Generate first, flag assumptions inline

The standard approach to spec tools is Q&A: the tool asks clarifying questions before generating anything. What's the user role? What's the auth model? What should happen on error?

The problem with Q&A is you have to know what you don't know. If you knew exactly what questions to ask, you'd be close to having the spec already.

spec-writer takes a different approach. It generates the full spec immediately and marks every decision it made that you didn't specify:

Given an authenticated user requesting an export
When the export contains more than 1,000 rows [ASSUMPTION: async for large exports]
Then the export is generated asynchronously and the user is notified by email
Enter fullscreen mode Exit fullscreen mode

At the end, every assumption is collected into a prioritized list:

## Assumptions to review

1. Async generation for exports over 1,000 rows — Impact: HIGH
   Correct this if: your order volumes are low and synchronous is fine

2. Date range filter is required — Impact: MEDIUM
   Correct this if: the first version should export all orders with no filtering
Enter fullscreen mode Exit fullscreen mode

Correcting a draft is faster than answering questions cold. The [ASSUMPTION: ...] tags show you exactly where the agent would have guessed wrong — before it guesses. That's the value. Not the spec itself. The visible assumptions.


A real example

I ran it on something I'm actually building — CLI capture for Foundation, my knowledge commons project. Foundation currently captures Claude.ai browser conversations via a browser extension. Claude Code sessions live in ~/.claude/projects/ and disappear after 30 days. I needed to capture those too.

The prompt:

/spec-writer Add CLI capture for Claude Code sessions stored in ~/.claude/projects/
Enter fullscreen mode Exit fullscreen mode

Here's a slice of what came back:

From the spec:

Given a developer with Claude Code sessions in ~/.claude/projects/
When they run the Foundation CLI capture command
Then a list of uncaptured sessions is displayed, sorted by most recent first

Given a session already in Foundation
When the developer runs capture again
Then that session does not appear in the review list

Given a malformed .jsonl file
When the capture command processes it
Then an error is logged for that file and processing continues on remaining sessions
Enter fullscreen mode Exit fullscreen mode

From the plan:

CLI capture runs locally — it needs to read the local filesystem, which 
Cloudflare Workers cannot do. [ASSUMPTION: CLI capture is a local Node.js 
or Bun script that calls the Foundation API to insert sessions, rather 
than a Worker itself]
Enter fullscreen mode Exit fullscreen mode

That assumption stopped me. I hadn't thought through the architecture split — Foundation runs on Cloudflare Workers, which can't read local filesystems. The skill surfaced it before I'd written a line of code.

The assumptions summary:

1. CLI capture is a local script calling the Foundation API, not a Worker
   Impact: HIGH
   Correct this if: you want a purely serverless approach

2. Manual curation before capture, not automatic bulk import
   Impact: HIGH
   Correct this if: you want automatic background capture

3. Session ID from .jsonl filename is the deduplication key
   Impact: MEDIUM
   Correct this if: session IDs are stored differently in your schema

4. No sensitive data scrubbing in v1
   Impact: MEDIUM
   Correct this if: your sessions contain credentials or keys
Enter fullscreen mode Exit fullscreen mode

Four assumptions, two of them high-impact. The architecture one I would have hit mid-implementation. The sensitive data one I wouldn't have thought about until someone complained.

That's the value. Not the spec itself. The visible assumptions.


How it fits into SDD

spec-writer gets you to Spec-First immediately, with no ceremony. One command, full output, correct the assumptions, hand to the agent.

Julián describes two levels beyond that. Spec-Anchored: the spec lives in the repo and evolves with the code. Spec-as-Source: the spec is the primary artifact, code is regenerated to match. If you want to move toward Spec-Anchored, save the output to specs/feature-name.md in your repo. The skill produces something worth keeping.

The methodology is Julián's. The skill is the friction remover.


Install

mkdir -p ~/.claude/skills
git clone https://github.com/dannwaneri/spec-writer.git ~/.claude/skills/spec-writer
Enter fullscreen mode Exit fullscreen mode

Then:

/spec-writer [your feature description]
Enter fullscreen mode Exit fullscreen mode

Works across Claude Code, Cursor, Gemini CLI, and any agent that supports the Agent Skills standard. The same SKILL.md file works everywhere.

The repo is at github.com/dannwaneri/spec-writer. The README has the full output format and a worked example.


The agents are getting better at implementing. The bottleneck was always specification — knowing what to build precisely enough that the agent doesn't have to guess. spec-writer doesn't remove that requirement. It makes it faster to satisfy.

The spec isn't the output. The assumptions are.


Built on the Spec Driven Development methodology — operationalized by tools like GitHub Spec Kit and OpenSpec. Julián Deangelis's writing on SDD at MercadoLibre was the direct inspiration.

Other skills: voice-humanizer — checks your writing against your own voice, not generic AI patterns.

Top comments (18)

Collapse
 
leob profile image
leob

Sounds useful !

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Try it on your next feature and let me know what the assumptions surface.

Collapse
 
leob profile image
leob

Thanks man - bookmarked this and archived it in my "AI - All The Stuff I Should Really Check Out" folder - will let you know when I get around to it!

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri • Edited

That folder name is doing a lot of work. The assumptions list is the part worth seeing in practice. it's not obvious until you watch the skill surface a decision you would have left implicit. When you get around to it, run it on something you're actually building, not a toy example. 👊👊

Thread Thread
 
leob profile image
leob

Gotcha - toy examples tend to be artificial, or simplistic ...

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

Exactly.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The assumptions that matter are the ones you didn't know you were making and those only show up when the domain is real. A toy example has no legacy constraints, no auth model you didn't pick, no business logic the agent has to guess at. The skill earns its keep on the second or third feature in a codebase you've been living in for months.

Collapse
 
kc900201 profile image
KC

If AI agents can even perform spec-driven development as well, I wonder how the role of developers will shift...

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The shift is already happening - away from writing code toward writing the contract that defines what correct looks like. The spec is the thing agents can't generate without you because it requires knowing what your system is supposed to do, what your business constraints are, what failed last time. The agent can implement from a good spec. It can't produce the spec without the understanding that comes from having built the system and watched it fail.
The role isn't disappearing. It's moving upstream.

Collapse
 
tjan profile image
Arend-Jan Van Drongelen

Just used this as my first skill to add a new 'birthday' feature to my telegrambot. Very usefull. I'm coding on my phone in a termux-ubuntu environment with claude code and codex, so its very easy to miss the unknown assumptions and end up in a long conversation to fix the problem. This fixes the problem up front :)

Even with a small feature like this it helps, for example it shows the commands it wanted me to type, but i want more of a free format - Fixed. Thanks for this!

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Termux on a phone is exactly the environment where the assumption problem bites hardest . The iteration cost is high enough that one wrong assumption can cost you an hour. Glad it caught the commands-vs-free-format issue before you were mid-conversation.
That free format flag is good feedback on the skill itself. Defaulting to explicit commands made sense for the use case I built it around, but it's an assumption the skill makes without asking which is ironic given what the skill is supposed to do. Worth adding a prompt at the start that asks about output format preference before generating the spec.
What's the birthday feature doing?

Collapse
 
tjan profile image
Arend-Jan Van Drongelen

It gives updates in the morning with a birthday reminder. Can create overview, add or remove birthdays. And sync with calender. Plan to add a personalized card feature.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

Calendar sync on a phone in Termux is genuinely impressive and a personalized card feature on top of that is the kind of thing that makes a utility actually feel like it was built for someone.

Collapse
 
cyber8080 profile image
Cyber Safety Zone

This is a really interesting shift from “prompting” to actually encoding thinking into reusable skills. Instead of repeating instructions every time, you’re building a system that compounds knowledge—exactly what modern AI workflows need. It also reinforces how critical clear specs are, since AI performs best when the intent is precise and structured

Collapse
 
dannwaneri profile image
Daniel Nwaneri

"Encoding thinking" is the right frame and it's why the output format matters as much as the prompt. A spec that marks its assumptions explicitly with [ASSUMPTION: ...] forces the thinking to be visible, not just the conclusion. The AI didn't decide those were assumptions. You did. That's the difference between a reusable artifact and a one-time answer.

Collapse
 
harsh2644 profile image
Harsh

The key insight here is brilliant: Correcting a draft is faster than answering questions cold. This is exactly why Q&A-first tools fail they assume you know what you don't know. By generating the spec first and flagging assumptions inline, you're making the unknown unknowns visible. That's a fundamentally better UX for spec writing. The architecture assumption surfacing in your example proves the point perfectly.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

"Unknown unknowns" is exactly the framing. Q&A assumes you can enumerate the gaps before you've seen the spec which is precisely when you can't. The generate-first approach inverts that: produce the draft, let the assumptions surface, then correct only what's wrong.
The architecture assumption in the Foundation example was the one that would have hit hardest mid-implementation. Cloudflare Workers can't read local filesystems - obvious in retrospect, invisible before the spec named it. That's the category that costs the most: not bugs you can see in the diff, but constraints you didn't know applied until the agent was already 3 hours into the wrong approach...

The Q&A tools that fail are optimizing for first-draft cleanliness. spec-writer optimizes for assumption visibility. Those are different goals and only one of them scales with complexity...

Collapse
 
mountaga_diallo_f40ae52b9 profile image
Mountaga Diallo

Great project, congrat

Some comments may only be visible to logged-in visitors. Sign in to view all comments.