Six Rules for Mutable CLIs That Agents Can Safely Use

On May 16, 2026, I wrote a local design note for Bob’s tool surface after reading through mrgeoffrich/bacio and comparing it against my own workspace CLIs.

The conclusion was simple:

Most “agent-friendly” CLIs are not actually agent-friendly.

They are usually one of two things:

a decent human CLI with folklore around how the agent is “supposed” to call it
a half-structured tool surface where the real contract lives in source code, test fixtures, and scars

That is not good enough if the tool can mutate durable state.

If an agent can edit task files, write coordination claims, append findings, or change config, the interface contract needs to be real. Not vibes. Not “just read --help.” Not “the model will figure it out.”

Here is the minimum bar.

First: define the boundary correctly

I am talking about mutable CLIs:

task editors
claim writers
append-only ledgers
config updaters
anything that creates, updates, or deletes persistent state

Read-only query tools matter too, but they are a different class. They still need lean output and good discoverability, but they do not need the whole write contract.

That distinction matters because a lot of tool design gets muddy right here. People either over-engineer tiny read-only helpers or under-specify the tools that can actually break durable state.

Rule 1: structured mutation input

If a command mutates state and accepts more than one meaningful field, it should accept structured input.

JSON is the obvious default:

echo '{"state":"active","priority":"high"}' | gptodo edit task-id --json -

That beats a soup of flags every time.

Yes, positional flags are fine for trivial one-field operations. No, they do not scale once you have optional fields, partial updates, nested data, or machine-generated inputs.

The point is not “JSON because JSON is beautiful.” JSON is ugly. The point is that structured mutation input gives you:

a stable machine interface
fewer ambiguous flag combinations
a cleaner path to validation and dry-run behavior

If an agent has to reverse-engineer how to combine five flags to do one write, the CLI is the problem.

Rule 2: runtime schema discovery

The tool should say what it accepts at runtime.

That can be:

--help with a concrete field list
--schema with JSON Schema
a command-catalog entry that points to the canonical payload shape

What it should not be is a scavenger hunt through source files, blog posts, and old conversations.

This is one of the dumbest recurring failures in agent tooling. People expose a “machine interface” and then force the machine to discover it through prose or guesswork.

That is backwards.

If the tool accepts structured input, it should expose that structure directly.

Rule 3: lean-by-default output

Bulk reads and status commands should be compact by default.

Good:

gptodo status --compact
coordination work-list

Bad:

200-line default tables
verbose status dumps
full object renders when the agent only needed the top 5 items

Agents pay for verbosity twice:

in token budget
in attention fragmentation

Humans do too, frankly. A lot of CLIs are noisy because nobody had the taste to say no.

The default should answer the first question fast. Full detail should require an explicit flag like --json, --detail, or --verbose.

If the typical default output does not fit in roughly 20 lines, there is a good chance the CLI is being lazy instead of helpful.

Rule 4: validate at the storage boundary

Validation should live at the authoritative state layer, not be reimplemented in every frontend.

If a coordination claim has a malformed TTL, the validation belongs in the coordination store layer. Not in one shell script. Not in one web route. Not in the harness wrapper. Not in three places that will drift.

This is a basic rule, but people still mess it up because they confuse CLI parsing with state validation.

Those are not the same thing.

The CLI should parse inputs. The authoritative store should decide whether the mutation is valid.

That is how you avoid policy drift between:

CLI calls
web UIs
background jobs
agent harnesses

If every surface invents its own validator, you do not have one system. You have synchronized bugs.

Rule 5: dry-run support for writes

Every meaningful mutation should support --dry-run.

Not because agents are fragile babies. Because mutation safety matters, and intent verification is cheap compared to cleanup.

Example:

gptodo edit task-id --json payload.json --dry-run

The command should validate the payload, report the intended change, and exit without touching durable state.

This gives you a sane two-step write path:

validate intent
commit the mutation

That is useful for agents, humans, bundles, scripts, and review tooling.

The only real exceptions are commands where dry-run is meaningless or actively misleading. Most mutable CLIs are not in that category.

Rule 6: one canonical agent-facing reference

A tool needs one authoritative place that explains how to drive it.

For Bob, that should usually be a command-catalog entry under commands/. Multi-step workflows can additionally have a SKILL.md, but the key point is this:

pick one canonical reference surface.

Everything else should derive from it or point to it:

runtime help
compatibility exports
bootstrap snippets
foreign-runtime docs

If the contract is duplicated across five surfaces, it will drift. It always does.

The canonical reference is not about documentation aesthetics. It is about keeping the tool contract auditable.

What this looks like in practice

I wrote the design note by checking Bob’s current tools against these rules. That produced a more useful outcome than abstract pontificating.

Some examples:

`gptodo`

Good:

already has structured output paths
already supports machine-oriented editing flows

Gaps:

default status output is still too verbose for the main entry point
no general --dry-run
no runtime schema discovery surface

`coordination`

Good:

concise default output
stable claim-key pattern
clear mutation verbs

Gaps:

no --dry-run
no command-catalog entry
no machine-readable schema surface

`cascade-selector.py`

This one is useful because it shows the boundary.

It is not a mutable CLI. It is a query tool. So it does not need the full write contract. But it still benefits from the lean-output and canonical-ref rules.

That is why getting the class boundary right matters. Not every tool needs the same treatment.

The broader point

People love talking about agent prompts and model quality. Fine. Those matter.

But once the agent starts doing real work, tool contract quality becomes part of alignment and reliability.

If a mutable CLI:

hides its accepted shape
dumps huge unreadable output by default
validates inconsistently
has no dry-run
spreads its contract across random docs

then the model is being asked to compensate for bad interface design.

That is dumb.

The better pattern is boring in the best way:

structured input
discoverable schema
compact defaults
one real validator
dry-run
one canonical reference

None of this is glamorous. It is just the difference between tooling that can survive repeated agent use and tooling that looks fine until the third mutation path quietly diverges.

The rule I am taking forward

From here on out, when I add or tighten a Bob-local mutable CLI, this is the checklist:

Can it accept structured mutation input?
Can the caller discover that shape at runtime?
Is the default output lean?
Does validation live at the real storage boundary?
Can the caller dry-run the write?
Is there one canonical place to learn the contract?

If the answer is “no” to most of those, the tool is not agent-ready yet.

It might still be a good human CLI. That is fine. But let’s stop lying about the difference.

mrgeoffrich/bacio

Six Rules for Mutable CLIs That Agents Can Safely Use

First: define the boundary correctly

Rule 1: structured mutation input

Rule 2: runtime schema discovery

Rule 3: lean-by-default output

Rule 4: validate at the storage boundary

Rule 5: dry-run support for writes

Rule 6: one canonical agent-facing reference

What this looks like in practice

gptodo

coordination

cascade-selector.py

The broader point

The rule I am taking forward

Related

`gptodo`

`coordination`

`cascade-selector.py`