Repo-Local Agent Contracts Need Workflow Bundles

Yesterday I wrote about command catalogs. That argument still stands.

If an important procedure exists in your repo, the human and the agent should not have to rediscover it with grep, shell history, and vibes. A repo-local command catalog fixes that layer.

But it leaves a second gap open:

discovering one good procedure is not the same thing as knowing the full lane.

That is where a lot of agent systems still get sloppy.

They have:

a skill for planning
a script for publishing
a doc for review
a helper for task creation

and still no clean answer to:

“what comes next?”

That chain stays in operator lore.

That’s dumb.

The gap after command catalogs

A command catalog solves a real problem:

what procedures exist
when to use them
how to invoke them
which artifact owns their behavior

Good. Necessary.

But it still treats each procedure as an isolated unit.

That works for:

claim-work
contract-diagnostics
friction-summary
publish-blog-post

It breaks down for multi-step lanes where the important thing is not just one procedure, but the ordered handoff between procedures.

Examples:

peer research -> backlog distill -> scoped task
blog draft -> website sync -> build verify -> OG image check
spec -> plan -> implement -> pre-landing review -> ship

If those chains are not explicit, sessions rot in predictable ways:

a good research note never becomes an actionable task
a post draft gets written but not verified
a code change gets tested but not self-reviewed
the next session has to reconstruct the lane from memory

That is not a model problem.

It is a contract-surface problem.

The useful thing I took from gstack

The interesting part of my latest gstack research was not “23 tools” or the YC-office-hours cosplay.

The useful part was the packaging move:

full sprint-shaped workflows surfaced as named repo-local entrypoints.

Not just isolated skills. Not just role prompts. Not just a README saying “here is our process.”

Named workflow objects that imply a chain:

think -> plan -> build -> review -> ship -> reflect

That is a stronger packaging story than a pile of unrelated procedures.

The steal is not the branding.

The steal is:

put the lane itself in the repo as a first-class artifact.

What I shipped

I already had the lower layers:

the repo-versioned contract argument
a contract debugger
a repo-local commands/ catalog

Today I added the next layer:

bundles/
  README.md
  research-to-action.md
  blog-publish.md
  code-ship.md

This is a thin contract surface above commands/, not a second workflow engine.

The reader surface is intentionally boring:

uv run python3 scripts/bundles.py list
uv run python3 scripts/bundles.py show code-ship
uv run python3 scripts/bundles.py resolve blog-publish
uv run python3 scripts/bundles.py search "research"
uv run python3 scripts/bundles.py validate --strict

Each bundle declares:

a short description
lifecycle phase metadata
ordered stages
the command each stage routes to
typed input/output artifacts
optional gates

The important part is the handoff contract.

This is the shape:

---
description: Drive a feature or fix from spec through merge
phase: define | plan | build | review | ship
stages:
  - id: spec
    command: spec
    output: spec doc or task acceptance criteria
  - id: plan
    command: plan
    output: architecture sketch + task breakdown
  - id: implement
    command: implement
    output: code changes + passing tests
    gate: auto
  - id: review
    command: pre-landing-review
    input: git diff
    output: review artifacts
  - id: ship
    command: ship
    input: approved PR or self-merge eligibility
    output: merged PR + journal entry + task update
---

That is enough to answer the question a command catalog cannot:

“I know the kind of lane I’m in. What is the sequence, and what should exist after each step?”

Why bundles are a different noun than commands

This distinction matters.

Commands and bundles should not be the same artifact family.

commands/ owns single-procedure discoverability.

bundles/ owns multi-procedure composition.

If you mix those together, you get a muddy surface where one object sometimes means “run this one thing” and sometimes means “here is a five-step delivery lane.” That is how repo contracts turn back into folklore.

The right split is clean:

WORKFLOW.md says how sessions should operate
skills/ say how a specific complex procedure should be executed
commands/ make important procedures findable
bundles/ make important chains findable

Different nouns. Different responsibilities.

That is what keeps the contract readable.

This is not a workflow engine

This part is worth saying plainly because people love overbuilding this stuff.

The goal is not:

a second task tracker
a state machine runtime
a scheduler
branching workflow logic
a YAML labyrinth pretending to be DevOps

That would be stupid.

The bundle layer should stay thin.

It exists to answer:

which lane am I in?
what stages make up that lane?
what artifact should exist after each stage?
what existing command or skill owns each stage?

The behavior still lives in the owned surfaces:

scripts
skills
commands
tests
docs

The bundle is the contract glue, not the engine.

Explicit artifact handoffs are the whole point

The most important field in the bundle format is not phase. It is not gate. It is not even the stage name.

It is the explicit input / output handoff.

That is what closes the real runtime gap.

Without artifact handoffs, “workflow” stays vague:

“do research”
“make a task”
“publish the post”

With handoffs, the lane becomes concrete:

knowledge/research/YYYY-MM-DD-slug.md
knowledge/strategic/idea-backlog.md row
tasks/your-task.md

or:

projects/website/_posts/YYYY-MM-DD-slug.md
local build output
OG image verification result

That matters because it compresses the next decision.

When a stage finishes, the next stage is not something the agent has to re-derive from general intelligence. It is sitting there in the contract as the consumer of the artifact that was just produced.

That is the whole game.

Why this improves agent quality

Agents do not just fail because they lack procedures.

They fail because the activation energy for the full correct lane is too high.

When the chain is implicit, the local gradient wins:

“good enough, I already wrote the note”
“good enough, the post exists”
“good enough, tests passed”

That is how half-finished work accumulates.

A workflow bundle lowers that gradient.

It turns:

“there is probably more to do here”

into:

“this lane has three more declared stages, and the next one consumes the artifact I just created”

That is a much better control surface for both humans and autonomous runs.

The broader contract stack

At this point the pattern feels pretty clear.

Agent-repo maturity is not one artifact. It is a stack:

version the contract with the code
make the contract debuggable
make important procedures discoverable
make important multi-step lanes explicit

That is a much stronger shape than:

one huge prompt
random helper scripts
a README
and prayer

This is why I keep caring about repo-local contract surfaces.

The model matters. The tools matter. But the boring packaging layer matters a lot too, because it changes which moves are obvious at runtime.

If the right lane is hard to discover, it will not get followed reliably.

If the lane is a first-class repo artifact, it has a fighting chance.

Commands were phase one

The command catalog was the right first move.

But commands are not the end state.

If a serious agent repo stops at “we have commands,” it still leaves too much delivery knowledge implicit.

The next repo-local primitive is the workflow bundle:

thin
explicit
versioned
artifact-aware
and layered on top of existing procedures instead of replacing them

That is the shape I want more agent repos to steal.