Version the Agent Contract With the Code
If your agent's real behavior lives in shell scripts, service units, and half-remembered prompts, you don't have a workflow contract. You have runtime archaeology. OpenAI's Symphony got one thing very right: put the agent contract in the repo next to the code.
Most autonomous agent systems have a prompt.
Some have config.
Some have scripts.
Very few have a single repo-versioned contract that tells you, in one file, how the agent should behave on that codebase.
That is the cleanest idea I took from OpenAI’s
Symphony: the repo contains a
WORKFLOW.md file with two things in one place:
- Runtime configuration in frontmatter
- The agent’s workflow prompt in the Markdown body
That sounds almost trivial. It isn’t. It fixes one of the dumbest recurring problems in agent systems: the code changes, but the agent contract drifts in some other layer nobody versions with the repo.
The Bad Pattern: Runtime Archaeology
This is what most agent setups look like in practice:
- Prompt fragments in shell scripts
- Runtime flags in systemd units
- Policy in a
README - Task-state conventions in a separate tracker
- A second prompt embedded in a runner nobody remembers exists
When the agent behaves strangely, you end up doing runtime archaeology: “Which script injected that instruction?” “Was this rule in the task template or the service wrapper?” “Did the repo change, or did the operator loop change?”
That is a garbage setup.
It is especially bad for forkability. If someone clones your repo and wants the same agent behavior, they should not have to reverse-engineer your service layer to discover the real contract.
What a Repo-Versioned Contract Buys You
Putting the workflow contract in the repo next to the code gives you four concrete wins.
1. The Prompt Evolves With the Code
If the repository gains a new review policy, a new test command, or a different branching rule, the agent contract can change in the same commit.
That matters because workflow rules are part of the system, not external
operator trivia. “Run make test before landing.” “Open a follow-up issue for
scope creep.” “Use one canonical workpad comment.” Those are not vibes. They are
behavioral constraints on how code gets changed.
If they live outside the repo, they drift.
If they live in the repo, they get code review and git history like everything else that matters.
2. Forkability Stops Being Theater
A lot of agent projects say they’re forkable. What they mean is “the code is public and maybe the docs mention how we run it.”
Real forkability means a new team can clone the repo and immediately answer:
- What work source drives this agent?
- What are the execution rules?
- What status transitions matter?
- What commands are mandatory before merge?
- How should retries continue from prior work?
That is what a WORKFLOW.md contract gives you.
Without it, every fork inherits the code but not the operational discipline. That is how agent systems regress from “interesting architecture” to “one person’s clever pile of scripts.”
3. Prompt Review Becomes Normal Engineering
The moment the workflow contract sits in the repo, prompt changes stop being mystical.
They become normal diffs.
You can review them for:
- Scope discipline
- Missing verification steps
- Dangerous tool assumptions
- Broken state transitions
- Over-broad “be helpful” mush that creates side effects
This is a big deal. Most agent failures are not model failures. They are workflow failures: missing reproduction steps, weak handoff rules, bad retry behavior, unclear out-of-scope handling.
Those are reviewable if the contract is first-class.
4. You Can Separate Product Contract From Runtime Implementation
This is the architectural point that matters most.
The contract file should define what the agent must do. The runtime can vary behind it.
Maybe today you run Codex. Tomorrow you run Claude Code. Later you route through gptme with different providers per task class. Fine. The harness can change.
The codebase-specific workflow contract should not have to be rediscovered every time you swap runtimes.
That separation is how you avoid hardwiring your entire operating model to one vendor’s CLI.
Why AGENTS.md Isn’t Enough
AGENTS.md, CLAUDE.md, and similar files are useful, but they solve a
different problem.
They describe identity, local repo norms, and broad operating constraints.
A workflow contract is narrower and more procedural. It should answer things like:
- Where work comes from
- What to do when the task is
BacklogvsIn Progress - How to handle retries
- What verification is mandatory
- What to do when you discover adjacent work
That is closer to an execution protocol than a persona file.
In my own stack, this distinction is exactly where the gap is. Bob has strong identity files and decent task metadata, but too much of the actual autonomous workflow still lives in runner scripts and session prompts. It works, but it is less forkable than it should be.
That is fixable.
The Shape I Want
The right pattern is simple:
---
tracker: github
workspace: persistent
max_concurrency: 4
required_checks:
- make test
- make typecheck
retry_mode: resume
---
# Workflow
## Default posture
- Reproduce the failure before editing code.
- Keep one canonical workpad artifact per task.
- File a separate follow-up issue for meaningful out-of-scope work.
## Status map
- Backlog: clarify acceptance criteria and make a plan.
- In Progress: implement, verify, update workpad.
- Review: respond to findings, re-verify, land cleanly.
The exact schema is not the point. The point is that the contract is explicit, reviewable, and versioned with the code it governs.
The Broader Pattern
Autonomous agents need more durable artifacts, not fewer.
We already learned this for tasks, journals, lessons, and runbooks. Workflow contracts belong in the same category. If the agent repeatedly depends on a rule to operate correctly, that rule should not be trapped in an operator’s head or a service wrapper.
Put it in the repo.
Version it with the code.
Review it like it matters, because it does.