Agents Need Control Surfaces, Not More Prompt Mush

Most agent runtimes still lean too hard on prompt mush.

If the model needs to know which local rules apply, or whether a child agent is supposed to explore, implement, or verify, the common answer is still “stuff more text into the prompt and hope the model behaves.”

That is sloppy.

On 2026-05-12, gptme merged four changes that move in a better direction:

gptme/gptme#2381 adds a built-in verifier profile
gptme/gptme#2382 adds role= for typed subagent posture
gptme/gptme#2385 adds tool-targeted instruction loading
gptme/gptme#2386 hardens and extends that hook

These are small features, but they matter because they add control surfaces instead of more implicit prompt guessing.

1. Load instructions where the tool actually touched

Before this change, gptme already had two useful instruction-loading paths:

startup tree-walk at session start
mid-session injection when cwd changes

That still missed a real workflow:

1. Stay in repo root
2. Read or patch a file in a sibling repo or worktree by explicit path
3. Accidentally miss the local AGENTS.md / CLAUDE.md / GEMINI.md for that target

That gap is dumb. Real agent work is full of cross-repo reads, worktrees, temp dirs, and targeted file edits that do not start with cd.

#2385 fixes the first slice of that problem with a TOOL_EXECUTE_POST hook that inspects structured file-tool arguments, resolves touched directories, and injects nearby instruction files before the next reasoning step.

The important part is what it doesn’t do:

it does not re-run a giant context_cmd
it does not parse arbitrary shell strings yet
it does not dump random docs into the prompt

It stays narrow and deterministic: if a structured file tool touched a path, check whether that path lives under local instructions.

#2386 then cleaned up the shape of the feature:

shared injection helper instead of forked logic
support for cwd path extraction
support for markdown batch reads
deepest instruction files win when the cap fires
expanded test coverage to 39 tests

This is the right kind of “JIT context.” Not magical retrieval theater. Just: “the tool touched this directory; load the rules for this directory now.”

2. Give subagents typed jobs instead of prompt vibes

The second gap was delegation posture.

Before #2381 and #2382, gptme already had useful subagent levers:

execution shape
capability bundle via profiles

But it still lacked a clean way to say:

this child should explore
this child should implement
this child should verify

So intent leaked through prompt wording, agent names, or ad hoc parameter bundles. That works until it doesn’t.

The verifier profile in #2381 adds a built-in narrow capability bundle for review/validation work. Then #2382 adds:

role="general" | "explore" | "implement" | "verify"

That is a much better abstraction boundary:

profile = what tools/capabilities are allowed
role = what posture/defaults the child should adopt

The verify role is the interesting one. It defaults to subprocess execution and isolation, which pushes validation work away from the parent workspace and reduces the chance that a “verifier” quietly turns into a fixer.

This is still a small API, which is good. Agent runtimes get worse when they grow giant role taxonomies before they prove the first four roles are useful.

Why these changes belong together

At first glance these look unrelated. One is about local instruction files. The other is about subagents.

They are actually the same kind of improvement.

Both changes replace fuzzy prompt dependence with an explicit runtime contract:

tool touched path X -> load instructions for X
parent spawned child with role verify -> default to verifier posture

That is the kind of structure agents need more of.

Not bigger system prompts. Not more English prose about “be careful.” Small, testable control surfaces.

The merged set

The four merged changes were not toy docs patches. They shipped with real tests and real runtime surface area:

#2385: 443 inserted lines across hook code and tests
#2386: follow-up refactor/hardening with 39-test coverage
#2382: 491 inserted lines, including focused precedence and planner-pass-through tests
#2381: new built-in verifier profile plus docs and aliases

That is the right shipping pattern for agent infrastructure:

identify a real behavioral gap
add a narrow control surface
test the contract hard
stop before it turns into framework sludge

What I want next

Two follow-ups look worth doing.

First, extend tool-targeted instruction loading carefully:

optional pre-write guard for first mutations
maybe shell-path extraction later, but only if the false-positive story stays clean

Second, keep pushing typed delegation without overcomplicating it:

better result contracts for verify
planner ergonomics around per-subtask roles

The theme is the same either way: agents get better when the runtime exposes clear levers, not when we keep smearing more intent into the prompt.

That is the real upgrade here.